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(57) Abstract 

Rapid and highly efficient proce- 
dures for the total synthesis of linear, 
double stranded DNA sequences in excess 
of about 200 base pairs in length, which 
sequences may comprise entire structural 
genes. Novel sequences are prepared from 
two or more DNA subunits provided with 
terminal regions comprising restriction en- 
donuciease cleavage sites facilitating in- 
sertion of subunits into a selected vector 
for purposes of amplification during the 
course of the total assembly process. The 
total, finally-assembled sequences include , 
at least one, and preferably two or more, 
unique restriction endonuclease cleavage 
site(s) at intermediate positions along the 
sequence, allowing for easy excision and 
replacement of subunits and the corre- 
sp wdnig - b * facile preparation of multiple 
structural analogs of polypeptides coded 
for by the sequences. Manufactured genes 
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properties. 





FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on 
plications under the PCT. 



AT 


Austria 


AU 


Australia 


BE 


Belgium 


BR 


Brazil 


CF 


Central African Republic 


CG 


Congo 


CH 


Switzerland 


CM 


Cameroon 


DE 


Germany. Federal Republic of 


DK 


Denmark 


n 


Finland 


FR 


France 


GA 


Gabon 


GB 


United Kingdom 


HU 


Hungary 


JP 


Japan 


KP 


Democratic People's Republic of Korea 



front pages of pamphlets publishing international ap- 



u 


Liechtenstein 


LK 


Sn Lanka 


LU 


Luxembourg 


MC 


Monaco 


MG 


Madagascar 


MR 


Mauritania 


MW 


Malawi 


NL 


Netherlands 


NO 


Norway 


RO 


Romania 


sc 


Sweden 


SN 


Senega] 


su 


Soviet Union 


TD 


Chad 


TG 


Togo 


US 


United Suites of America 



WO 83/04053 



l'CT/US83/00605 



- 1 - 



THE MANUFACTURE AND EXPRESSION 
OF LARGE STRUCTURAL GENES 

This is a continuation-in-part of co-pending 
U.S. Patent Application Serial No. 375,494, filed 

• - . 

5 May 6, 1982. 

The present invention relates generally 

to the manipulation of genetic materials and, more 
particularly, to the manufacture of specific UNA se- 
quences useful in recombinant procedures to secure 

10 the production of proteins of interest. 

Genetic materials may be broadly defined 
as those chemical substances which program for and 
guide the manufacture of constituents of cells and 
viruses and direct the responses of cells and viruses. 

15 A long chain polymeric substance known as deoxyribo- 
nucleic acid (DNA) comprises the genetic material 
of all living cells and viruses except for certain 
viruses which are programmed by ribonucleic acids 
(RNA) . The repeating units in DNA polymers are four 

20 different nucleotides, each of which consists of either 
a purine (adenine or guanine) or a pyrimidine (thymine 
or cytosine) bound to a deoxyribose sugar to which 
a phosphate group is attached. Attachment of nucleotides 
in linear polymeric form is by means of fusion of 

25 the 5' phosphate of one nucleotide to the 3' hydroxyl 
group of another. Functional DNA occurs in the form 
of stable double stranded associations of single strands 
of nucleotides (known as deoxyoligonucleotides) , which 
associations occur by means of hydrogen bonding between 

/,, -^fffo"^ 
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purine and pyrimidine bases [i.e., "complementary" 
associations existing either between adenine (A) and 
thymine (T) or guanine (G) and cytosine (C) ] . By 
convention, nucleotides are referred to by the names 
5 of their constituent purine or pyrimidine bases, and 

the complementary associations of nucleotides in double 
stranded DNA (i.e., A-T and G-C) are referred to as 
"base pairs". Ribonucleic acid is a polynucleotide 
comprising adenine, guanine, cytosine and uracil (U) , 
10 rather than thymine, bound to ribose and a phosphate 
group. 

Most briefly put, the programming function 
of DNA is generally effected through a process wherein 
specific DNA nucleotide sequences (genes) are "trans- 

15 cribed" into relatively unstable messenger RNA (mRNA) 
polymers. The mRNA, in turn, serves as a template 
for the formation of structural, regulatory and cata- 
lytic proteins from amino acids. This translation 
process involves the operations of small RNA strands 

20 (tRNA) which transport and align individual amino 
acids along the mRNA strand to allow for formation 
of polypeptides in proper amino acid sequences. The 
mRNA "message", derived from DNA. and providing the 
.basis for the tRNA supply and orientation of any given 

25 one of the twenty amino acids for polypeptide "expres- 
sion", is in the form of triplet "codons" — sequential 
groupings of three nucleotide bases. In one sense, 
the formation of a protein is the ultimate form of 
"expression" of the programmed genetic message provided 

30 by the nucleotide sequence of a gene. 

Certain DNA sequences which usually "precede" 
a gene in a DNA polymer provide a site for initiation 
of the transcription into mRNA. These are referred 
to as "promoter" sequences. Other DNA sequences, 

35 also usually "upstream" of (i.e., preceding) a gene 
in a given DNA polymer, bind proteins that determine 
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the frequency (or rate) of transcription initiation. 
These other seqeunces are referred to as "regulator" 
sequences. Thus, sequences which precede a selected 
gene (or series of genes) in a functional DNA polymer 
and which operate to determine whether the transcription 
(and eventual expression) of a gene will take place 
are collectively referred to as "promoter/regulator" 
or "control" DNA sequences. DNA sequences which "follow" 
a gene in a DNA polymer and provide a signal for termina- 
tion of the transcription into mRNA are referred to 

as "terminator" sequences. 

A focus of microbiological processing for - 
nearly the last decade has been the attempt to manufac- 
ture industrially and pharmaceutically significant 
15 substances using organisms which do not intially have 
genetically coded information concerning the desired 
product included in their DNA. Simply put, a gene 
that specifies the structure of a product is either 
isolated from a "donor" organism or chemically synthe- 
20 sized and then stably introduced into another organism 
which is preferably a self-replicating unicellular 
microorganism. Once this is done, the existing machinery 
for gene expression in the "transformed" host cells 
.operates to construct the desired product. 
25 The art is rich in patent and literature 

publications relating to "recombinant DNA" methodologies 
for the isolation, synthesis, purification and amplifica- 
tion of genetic materials for use in the transformation 
of selected host organisms. U.S. Letters Patent No. 
4,237,224 to Cohen, et al., for example, relates to 
transformation of procaryotic unicellular host organisms 
with "hybrid" viral or circular plasmid DNA which 
includes selected exogenous DNA sequences. The proce- 
dures of the Cohen, et al. patent first involve manufac- 
35 ture of a transformation vector by enzymatically cleav- 
ing viral or circular plasmid DNA to form linear DNA 
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strands* Selected foreign DNA strands are also prepared 
in linear form through use of similar enzymes. The 
linear viral or plasmid DNA is incubated with the 
foreign DNA in the presence of ligating enzymes capable 
5 of effecting a restoration process and "hybrid" vectors 
are formed which include the selected foreign DNA 
segment •spliced" into the viral or circular DNA plasmid. 

Transformation of compatible unicellular 
host organisms with the hybrid vector results in the 

10 formation of multiple copies of the foreign DNA in 
the host cell population. In some instances, the 
desired result is simply the amplification of the 
foreign DNA and the "product" harvested is DNA. More 
frequently , the goal of transformation is the expression 

15 by the host cells of the foreign DNA in the form of 
large scale synthesis of isolatable quantities of 
commercially significant protein or polypeptide fragments 
coded for by the foreign DNA. See also, e.g., U.S. 
Letters Patent Nos. 4,269,731 (to Shine), 4,273,875 

20 (to Manis) and 4,293,652 (to Cohen). 

The success of procedures such as described 
in the Cohen, et al. patent is due in large part to 
the ready availability of "restriction endonuclease* 
. enyzmes which facilitate the site-specific cleavage 

25 of both the unhybridized DNA vector and, e.g., eukaryotic 
DNA strands containing the foreign sequences of interest. 
Cleavage in a manner providing for the formation of 
single stranded complementary "ends" on the double 
stranded linear DNA strands greatly enhances the likeli- 

30 hood of functional: incorporation of the foreign DNA 
into the vector upon "ligating" enzyme treatment. 
A large number of such restriction endonuclease enzymes 
are currently commercially available [See, e.g., "BRL 
Restriction Endonuclease Reference Chart" appearing 

35 in the " , 81/ , 82 Catalog" of Bethesda Research Labora- 
tories, Inc., Gaithersburg, Maryland.] Verification 
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of hybrid formation is facilitated by chromatographic 
techniques which can, for example, distinguish the 
hybrid plasmids from non-hybrids on the basis of molecular 
weight. Other useful verification techniques involve 
5 radioactive DNA hybridization. 

Another manipulative "tool" largely responsible 



for successes in 



transformation of procaryotic cells 



10 in 
DNA 



is the use of selectable "marker" gene sequences. 
Briefly put, hybrid vectors are employed which contain, 
addition to the desired foreign DNA, one or more 
sequences which code for expression of a phenotypic 
trait capable of distinguishing transformed from non- 
transformed host cells. Typical marker gene sequences 
are those which allow a transformed procaryotic cell 
15 to survive and propagate in a culture medium containing 



metals, antibiotics, and like components 
kill or severely inhibit propagation of non-transformed 



which 
non 



would 



host cells 



20 in a trans 
extent on 



vector wi 



Successful expression of an exogenous gene 
formed host microorganism depends to a great 
incorporation of the gene into a transformation 
th a suitable promoter/regulator region present 



to insure transcription of the gene into mBNA and 
other signals which insure translation of the mRNA 



25 message into protein (e.g., 



r l 



bosome binding sites) 



_ is not often the case that the "original" promoter /- 
regulator region of a gene will allow for high levels 



of expression in 
gene to 



the new host. Consequently, the 



be inserted must either be fitted with a new, 
host-accommodated transcription and translation regu- 
lating DNA sequence prior to insertion or it must 
be inserted at a site where it will come under the 
control of existing transcription and translation 

signals in the vector DNA. 

It is frequently the case that the insertion 

— • 

of an exogenous gene into, e.g. 



a ci 



rcular DNA plasmid 
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vector, is performed at a site either immediately 
following an extant transcription and translation 
signal or within an existing plasmid-borne gene coding 
for a rather large protein which is the subject of 
high degrees of expression in the host. In the latter 
case, the host's expression of the "fusion gene" so 
formed results in high levels of production of a "fusion 
protein" including the desired protein sequence (e.g., 
as an intermediate segment which can be isolated by 
chemical cleavage of large protein) . Such procedures 
not only insure desired regulation and high levels 
of expression of the exogenous gene product but also 
result in a degree of protection of the desired protein 
product from attack by proteases endogenous to the 
15 host. Further, depending on the host organism, such 

procedures may allow for a kind of "piggyback" transporta 
tion of the desired protein from the host cells into 
the cell culture medium, eliminating the need to destroy 
host cells for the purpose of isolating the desired 

t 

20 product . 

While the foregoing generalized descriptions 
of published recombinant DNA methodologies may make 
the processes appear to be rather straightforward, 
.easily performed and readily verified, it is actually 
25 the case that the DNA sequence manipulations involved 
are quite painstakingly difficult to perform and almost 
invariably characterized by very low yields of desired 
products . 

As an example, the initial "preparation" 
30 of a gene for insertion into a vector to be used, in 

transformation of a host microorganism can be an enor- 
mously difficult process, especially where the gene 
to be expressed is endogenous to a higher organism 
such as man. One laborious procedure practiced in 
35 the art is the systematic cloning into recombinant 

plasmids of the total DNA genome of the "donor" cells. 
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generating immense "libraries" of transformed cells 
carrying random DNA sequence fragments which must 
be individually tested for expression of a product 
of interest. According to another procedure, total 
5 mRNA is isolated from high expression donor cells 
(presumptively containing multiple copies of mRNA 
coded for the product of interest), first "copied" 
into single stranded cDNA with reverse transcriptase 
enzymes, then into double stranded form with polymerase, 

10 and cloned.- The procedure again generates a library 
of transformed cells somewhat .smaller than a total 
genome library which may include" the desired gene 
copies free of non-transcribed "introns" which can 
significantly interfere with expression by a host 

15 microorganism. . The above-noted time-consuming gene 

isolation procedures were in fact employed in published 
recombinant DNA procedures for obtaining microorganism 
expression of several proteins, including rat proinsulin 
[Ullrich, et al.. Science , 196, PP- 1313-1318 (1977)], 

20 human fibroblast interferon [Goedell, et al.. Nucleic 

A C ids Research , 8, pp. 4087-4094 (1980)], mouse 8-endor- 
phin [Shine, et al.. Nature , 285, pp. 456-461 (1980)] 
and human leukocyte interferon [Goedell, et al., Nature, 
• 287, pp. 411-416 (1980); and Goedell, et al.. Nature , 

25 290, pp. 20-26 (1981)]. 

Whenever possible, the partial or total 
manufacture of genes of interest from nucleotide bases 
constitutes a much preferred procedure for preparation 
of genes to be used in recombinant DNA methods. A 

30 requirement for such manufacture is, of course, knowledge 
of the correct amino acid sequence of the desired 
polypeptide. With this information in hand, a generative 
DNA sequence code for the protein (i.e., a properly 
ordered series of base triplet codons) can be planned 

35 and a corresponding synthetic, double stranded DNA 

segment can be constructed. A combination of manufac- 
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turing and cDNA synthetic methodologies is reported 
to have been employed in the generation of a gene 
for human growth hormone. Specifically/ a manufactured 
linear double stranded DNA sequence of 72 nucleotide 
5 base pairs (comprising codons specifying the first 

24 amino acids of the desired 191 amino acid polypeptide) 
was ligated to a cDNA-derived double strand coding 
for amino acids Nos. 25-191 and inserted in a modified 
pBR322 plasmid at a locus controlled by a lac promotor/- 
10 regulator sequence [Goedell, et al-. Nature , 281, 
pp. 544-548 (1981)]. 

Completely synthetic procedures have been 
employed for the manufacture of genes coding for rela- 

■ 

tively "short" biologically functional polypeptides, 
15 such as human somatostatin (14 amino acids) and human 
insulin (2 polypeptide chains of 21 and 30 amino acids, 

respectively) • 

In the somatostatin gene preparative procedure 
[Itakura, et al.. Science , 198 , pp. 1056-1063 (1977)1 

20 a 52 base pair gene was constructed wherein 42 base 
pairs represented the codons specifying the required 
14 amino acids and an additional 10 base pairs were 
added to permit formation of "sticky-end" single stran- 
. ded terminal regions employed for ligating the structural 

25 gene into a microorganism transformation vector • 

Specifically, the gene was inserted close to the end 
of a 0-galactosidase enzyme gene and the resultant 
fusion gene was expressed as a fusion protein from 
which somatostatin was isolated by cyanogen bromide 

30 cleavage . Manufacture of the human insulin gene , 

as noted above, involved preparation of genes coding 
for a 21 amino acid chain and for a 30 amino acid 
chain. Eighteen deoxyoligonucleotide fragments were 
combined to make the gene for the longer chain, and 

35 e i e ven fragments were joined into a gene for the shorter 
chain. Each gene was employed to form a fusion gene 
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with a S-galactosidase gene and the individually ex- 
pressed polypeptide chains were enzymatically isolated 
and linked to form complete insulin molecules. [Goedell, 
et al., Proc. Nat, Acad. Sci. U.S.A ., 76., pp. 106-110 
5 (1979).] 

" In each of the above procedures, deoxyoligo- 
nucleotide segments were prepared , and then sequentially 
ligated according to the following general procedure. 
[See, e.g., Agarwal, et al. r Nature , 227 , pp. 1-7 
10 (1970) and Khorana, Science , 203, pp. 614-675 (1979)1. 

An initial "top" (i.e., 5 9 -3* polarity) deoxyoligonucleo- 
tide segment is enzymatically joined to a second "top" 
segment. Alignment of these two "top" strands is 
made possible using a "bottom" (i.e., 3* to 5 f polarity) 
15 strand having a base sequence complementary to half 
of the first top strand and half of the second top 
strand. After joining, the uncomplemented bases of 
the top strands "protrude" from the duplex portion 
formed. A second bottom strand is added which includes 
20 the five or six base complement of a protruding top 
strand, plus an additional five or six bases which 
then protrude as a bottom single stranded portion. 
The two bottom strands are then joined. Such sequen- 
. tial additions are continued until a complete gene 
25 sequence is developed, with the total procedure being 
very time-consuming and highly inefficient. 

The time-consuming characteristics of such 
methods for total gene synthesis are exemplified by 
reports that three months' work by at least four inves- 
30 tigators was needed to perform" the assembly of the 
two "short", insulin genes previously referred to. 
Further, while only relatively small quantities of 
any manufactured gene are needed for success of vector 
insertion, the above synthetic procedures have such 
35 poor overall yields (on the order of 20% per liga- 
tion) that the eventual isolation of even minute quanti- 



WO 83/04053 



PCT/US83/00605 



- 10 - 



ties of a selected short gene is by no means guaranteed 
with even the most scrupulous adherence to prescribed 
methods. The maximum length gene which can be synthe- 
sized is clearly limited by the efficiency with which 
5 the individual short segments can be- joined. If n 
such ligation reactions are required and the yield 
of each such reaction is v., the quantity of correctly 
synthesized genetic material obtained will be propor- 
tional to y n . Since this relationship is expotential 

10 in nature, even a small increase in the yield per 

ligation- reaction will result in a substantial increase 
in the length of the largest gene that may be synthesized 

Inefficiencies in the above-noted methodology 
are due in large part to the formation of undesired 

15 intermediate products. As an example, in an initial 
reaction forming annealed top strands associated with 
a bottom, "template" strand, the desired reaction . 
may be. 



20 



+ 
b 



25 but the actual products obtained may be 



or 



30 ' a /V 



or the like. Further, the longer the individual deoxy 
olidonucleotides are, the more likely it is that they 
35 will form thermodynaraically stable self-associations 
such as "hairpins" or aggregations. 
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Proposals for increasing synthetic efficiency 
have not been forthcoming and it was recently reported 
that, "With the methods now available, however, it 
is not economically practical to synthesize genes 
5 for peptides longer than about 30 amino acid units,. 

and many clinically important proteins are much longer". 
[Aharonowitz, et al. Scientifi c American, 245, No. 
3, pp. 140-152, at p. 151 {1981) .1 

An illustration of the "economic practicali- 

10 ties" involved in large gene synthesis is provided 
by the recent publication of "successful" efforts" 
in the total synthesis of a human leukocyte interferon 
gene. [Edge, et al., Nature, 292, pp. 756-782 (1981).] 
Briefly summarized, 67 different deoxyoligonucleotides 

15 containing about 15 bases were synthesized and joined 
in the "50 percent overlap" procedure of the type 
noted above to form eleven short duplexes. These, 
in turn were assembled into four longer duplexes which 
were eventually joined to provide a 514 base pair 

20 gene coding for the 166 amino acid protein. The proce- 
dure, which the authors characterize as "rapid", is 
reliably estimated to have consumed nearly a year's 
effort by five workers and the efficiency of the assembly 
.strategy was clearly quite poor. It may be noted, 

25 for example, that while 40 pmole of each of the starting 
67 deoxyoligonucleotides was prepared and employed 
to form the eleven intermediate-sized duplexes, by 
• the time assembly of the four large duplexes was achieved, 
a yield of only about 0.01 pmole of the longer duplexes 

30 could be obtained for. use in final assembly of the 
whole gene. 

Another aspect of the practice of recombinant 
DNA techniques for the expression, by microorganisms, 
of proteins of industrial and pharmaceutical interest 
35 is the phenomenon of "codon preference". While it 
was earlier noted that the existing machinery for 
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gene expression in genetically transformed host cells 
will "operate" to construct a given desired product, 
levels of expression attained in a microorganism can 
be subject to wide variation, depending in part on 
5 specific alternative forms of the amino acid-specifying 
genetic code present in an inserted exogenous gene. 
A "triplet" codon of four possible nucleotide bases 
can exist in 64 variant forms. That these forms provide 
the message for only 20 different amino acids {as 
10 well as transcription initiation and termination) 

means that some amino acids can be coded for by more 
than one codon. Indeed, some amino acids have as 
many as six "redundant", alternative codons while 
some others have a single, required codon. For reasons 
15 not completely understood, alternative codons are 
not at all uniformly present in the endogenous DNA 
of differing types of cells and there appears to exist 
a variable natural hierarchy or "preference" for certain 
codons in certain types of cells. 
20 As one example, the amino acid leucine is 

specified by any of six DNA codons including CTA, 
CTC, CTG, CTT, TTA, and TTG (which correspond, respec- 
tively, to the mBNA codons, CUA, CUC, COG, CUU, ODA 
• and UUG) . Exhaustive analysis of genome codon f requen- 
25 cies for microorganisms has revealed endogenous DNA 
of E. coli bacteria most commonly contains the CTG 
leucine-specifying codon, while the DNA of yeasts 
and slime molds most commonly includes a TTA leucine- 
specifying codon. In view of this hierarchy, it is 
30 generally held that the likelihood of obtaining high 
levels of expression of a leucine-rich polypeptide 
by an E. coli host will depend to some extent on the 
frequency of codon use. For example, a gene rich 
in . TTA codons will in all probability be poorly expressed 
35 in E. coli , whereas a CTG rich gene will probably 
highly express the polypeptide. In a like manner, 
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35 



when yeast cells are the projected transformation 
host cells for expression of a leucine-rich polypeptide 
a preferred codon for use i 



be TTA 



e.g 



5 Research/ 8, pp. r49 



n an inserted DNA would 
Grantham, et al. Nucleic Acids 
-r62 (1980); Grantham, et al.. 



Nucleic 



&ci ds Research , 8, pp. 1893-1912 (1980); 

, Mueleic Acids Research, 9_, 



and, Grantham, et al 
pp. r43-r74 (1981) 



10 on r 



The implications of codon preference phenomena 
ecombinant DNA techniques are manifest, and the 



phenomenon may serve to explain many prior failures 
to achieve high expression levels for exogenous genes 
in successfully transformed host organisms — a less 
f erred" codon may be repeatedly present in the 



'pre 



15 inserted gene and the host cell machinery for expression 
may not operate as efficiently. This phenomenon directs 
the conclusion that wholly manufactured genes which 
have been designed to include a projected host cell's 
preferred codons provide a preferred form of foreign 
20 genetic material for practice of recombinant DNA tech- 
. niques. In this context, the absence of procedures 
for rapid and efficient total gene manufacture which 
would permit codon selection is seen to constitute 



an even more serious 



roadblock to advances in the 



25 art 



of the present i 



Of substantial interest to the background 

nvention is the state of the art with 



re 
ca 



gard to the 



preparation and use of a class of biologi- 



lly active substances, the interferons (IFNs) . 



30 interferons are 
defined antivir 



secreted proteins having fairly well- 
al, antitumor and immunomodulatory 



characteristics. See, e.g 



Gray, et al.. Nature 



503-508 (1982) and Edge, et al., supr, 



295 , pp. 

references cites therein 



and 



and 



On the basis of antigenicity and biological 
chemical properties, human interferons have been 
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grouped into three major classes: IFN-a (leukocyte), 
IFN-B (fibroblast) and IFN-y (immune). Considerable 
information has accumulated on the structures and 
properties of the virus-induced acid-stable interferons 
5 (IFN-a and Q) . These have been purified to homogeneity 
and at least partial amino acid sequences have been 
determined* Analyses of cloned cDNA and gene sequences 
for IFN-B^ and the IFN-a multigene family have permitted 
the deduction of the complete amino acid sequences 
10 of many of the interferons. In addition, efficient 
synthesis of IFN-B , and several IFN-as in £• coli , 

X — * " 

and IFN-a^, in yeast , have now made possible the purifica- 
tion of large quantities of these proteins in biologi- 
cally active form. 

15 Much less information is available concerning 

the structure and properties of IFN-y, an interferon 
. generally produced in cultures of lymphocytes exposed 
to various mitogenic stimuli* It is acid labile 
and does not cross-react with antisera prepared against 

20 IFN-a or IFN-B. A broad range of biological activities 
have been attributed to IFN-y including potentiation 
of the antiviral activities of IFN-a and -0, from 
. which it differs in terms of its virus and cell specif ici- 
• ties and the antiviral mechanisms induced. In vitro 

25 studies performed with crude preparations suggest 

that the primary function of IFN-y may be as an immuno- 
regulatory agent. The antiproliferative effect of 
IFN-y on transformed cells has been reported to be 
10 to 100-fold greater than that of IFN-a or -8/ suggest- 

30 ing a potential use in the treatment of neoplasia* 
Murine IFN-y preparations have been shown to have 
significant antitumor activity against mouse sarcomas. 

It has recently been reported (Gray, et 
al-r supra ) that a recombinant plasmid containing 

35 a cDNA sequence coding for human IFN-y has been isolated 
and characterized. Expression of this sequence in 
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E. coli and cultured monkey cells is reported to give 
rise to a polypeptide having the properties of authentic 
human IFN-y* * n publication , the cDNA sequence 

and the deduced 146 amino acid sequence of the "mature" 
polypeptide, exclusive of the putative leader sequence, 
is as follows: 

1 10 

Cys-Tyr-Cys-Gln-Asp-Pro-Tyr-Val-Lys-Glu-Ala-Glu-Asn-Leu- 

TGT TAC TGC CAG CAG CAA TAT GTA AAA GAA GCA GAA AAC CTT 

20 

Lys-Lys-Tyr-Phe-Asn-Ala-Gly-His-Ser-Asp-Val-Ala-Asp-Asn- 
AAG AAA TAT TTT AAT GCA GGT CAT TCA GAT GTA GCG GAT AAT 

30 40 
Gly-Thr-Leu-Phe-Leu-Gly-Ile-Leu-Lys-Asn-Trp-Lys-Glu-Glu- 

15 GGA ACT CTT TTC TTA GGC ATT TTG AAG AAT TGG AAA GAG GAG 

50 

Ser-Asp-Arg-Lys-Ile-Met-Gln-Ser-Gln-Ile-Val-Ser-Phe-Tyr- 
AGT GAC AGA AAA ATA ATG CAG AGC CAA ATT GTC TCC TTT TAC 

60 70 
20 phe-Lys-Leu-Phe-Lys-Asn-Phe-Lys-Asp-Asp-Gln-Ser-Ile-Gln- 

TTC AAA CTT TTT AAA AAC TTT AAA GAT GAC CAG AGC ATC CAA 

80 

Lys-Ser-Val-Glu-Thr-Ile-Lys-Glu-Asp-Met-Asn-Val-Lys-Phe- 

AAG AGT GTG GAG ACC ATC AAG GAA GAC ATG AAT GTC AAG TTT 

25 

90 

Phe«Asn-Ser-Asn-Lys-Lys-Lys-Arg-Asp-Asp-Phe-Glu-Lys-Leu- 
TTC AAT AGC AAC AAA AAG AAA CGA GAT GAC TTC GAA AAG CTG 

100 110 
Thr-Asn-Tyr-Ser-Val-Thr-Asp-Leu-Asn-Val-Gln-Arg-Lys-Ala- 

30 ACT AAT TAT TCG GTA ACT GAC TTG AAT GTC CAA CGC AAA GCA 

120 

Ile-His-Glu-Leu-Ile-Gln-Val-Met-Ala-Glu-Leu-Ser-Pro-Ala- 
ATA CAT GAA CTC CTC ATC CAA ATG GCT GAA CTG TCG CAA GCA 

130 140 
35 Ala-Lys-Thr-Gly-Lys-Arg-Lys-Arg-Ser-Gln-Met-Leu-Phe-Gln- 

GCT AAA ACA GGG AAG CGA AAA AGG AGT CAG ATG CTG TTT CAA 
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146 

Gly-Ar g-Ar g -Ala-Ser -Gin 
GGT CGA AGA GCA TCC CAG. 

In a previous publication of the sequence, 
5 arginine, rather than glutamine, was specified at 

position 140 in the sequence. (Unless otherwise indi- 
cated, therefore, reference to "human immune interferon" 
or, simply "IFN-y" shall comprehend both the [Arg ] 

and [Gin 1 ] forms . ) 
10 The above-noted wide variations in biological 

activities of various interferon types makes the construe 
tion of synthetic polypeptide 'analogs of the interferons 
of paramount significance to the full development 
of the therapeutic potential of this class of compounds. 
15 Despite the advantages in isolation of quantities 

of interferons which have been provided by recombinant 
DNA techniques to date, practitioners in this field 
have not been able to address the matter of prepa- 
ration of synthetic polypeptide analogs of the inter - 
20 ferons with any significant degree of success. 

Put another way, the work of Gray, et al., 
supra, in the isolation of a gene coding for IFN-y 
and the extensive labors of Edge, et al., supra , in 
. providing a wholly manufactured IFN-a^ gene provide 
25 only genetic materials for expression of single, very 
precisely defined, polypeptide sequences. There exist 
no procedures (except, possibly, for site specific 
mutagenesis) which would permit microbial expression 
of large quantities of human IFN-y analogs which dif- 
30 fered from the "authentic" polypeptide in terms of 

the identity or location of even a single amino acid. 
In a like manner, preparation of an IFN-Oj^ analog 
which differed by one amino acid from the polypeptide 
prepared by Edge, et al., supra , would appear to require 
35 an additional year of labor in constructing a whole 
new gene which varied in terms of a single triplet 

once* 
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codon. No means is readily available for the excision 
of a fragment of the subject gene and replacement 
with a fragment including the coding information for 
a variant polypeptide sequence. Further , modification 
5 of the reported cDNA-derived and manufactured DNA 
sequences to vary codon usage is not an available 
"option". 

Indeed, the only report of the preparation 
of variant interferon polypeptide species by recombinant 

10 DNA techniques has been in the context of preparation 
and expression of "hybrids" of human genes for IFN-c^ 
and IFN-a 2 [Week, et al.*, Nucleic Acids Research , 
9, pp. 6153-6168 (1981) and Streuli, et al., Proc. 
Nat. Acad, Sci, U.S.A ., 7*, pp. 2848-2852 (1981)]. 

15 The hybrids obtained consisted of the four possible 
combinations of gene fragments developed upon finding 
that two of the eight human (cDNA-derived) genes for- 
tuitously included only once within the sequence, 
base sequences corresponding to the restriction endo- 

20 nuclease cleavage sites for the bacterial endonucleases, 

PvuII and Bglll. 

There exists, therefore, a substantial need 

in the art for more efficient procedures for the total 
synthesis from nucleotide bases of manufactured DNA 

25 sequences coding for large polypeptides such as the 
interferons. There additionally exists a need for 
synthetic methods which will allow for the rapid construc- 
tion of variant forms of synthetic sequences such 
as will permit the microbial expression of synthetic 

30 polypeptides which vary from naturally occurring forms 
in terms of the identity and/or position of one or 
more selected amino acids. 
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BRIEF SUMMARY 

The present invention provides novel, rapid 
and highly efficient procedures for the total synthesis 
5 of linear, double stranded DNA sequences in excess 
of about 200 nucleotide base pairs in length, which 
sequences may comprise entire structural genes capable 
of directing the synthesis of a wide variety of polypep- 
tides of interest. 

10 According to the invention, linear, double 

stranded DNA sequences of a length in excess of about 
200 base pairs and coding for 'expression of a predeter- 
mined continuous sequence of amino acids within a 
selected host microorganism transformed by a selected . 

15 DNA vector including the sequence, are synthesized 
by a method comprising: 

(a) preparing two or more different, subunit, 
linear, double stranded DNA sequences of about 100 
or more base pairs in length for assembly in a selected 

20 assembly vector, 

each different subunit DNA sequence prepared 
comprising a series of nucleotide base codons coding 
for a different continuous portion of said predetermined 
• sequence of amino acids to be expressed, 

25 one terminal region of a first of said sub- 

units comprising a portion of a base sequence which 
provides a recognition site for cleavage by a first 
restriction endonuclease , which recognition site is 
entirely present either once or not at all in said 

30 selected assembly vector upon insertion of the subunit 
therein , 

one terminal region of a second of said 
subunits comprising a portion of a base sequence which 
provides a recognition site for cleavage by a second 
35 restriction endonuclease other than said first endo- 
nuclease, which recognition site is entirely present 
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once or not at all in said selected assembly vector 
upon insertion of the subunit therein , 

at least one-half of all remaining terminal 
regions of subunits comprising a portion of a recogni- 
5 tion site (preferably a palindromic six base recognition 
site) for cleavage by a restriction endonuclease other 
than said first and second endonucleases , which recogni- 
tion site is entirely present once and only once in 
said selected assembly vector after insertion of all 

10 subunits thereinto; and 

(b) serially inserting each of said subunit 
DNA sequences prepared in step "(a) into the selected 
assembly vector and effecting the biological amplifica- 
tion of the assembly vector subsequent to each insertion, 

15 thereby to form a DNA vector including the desired 
DNA sequence coding for the predetermined continuous 
amino acid sequence and wherein, the desired DNA sequence 
assembled includes at least one unique, preferably 
palindromic six base, recognition s\te for restriction 

20 endonuclease cleavage at an intermediate position 
therein. 

The above general method preferably further 
includes the step of isolating the desired DNA sequence 
. from the assembly vector preferably to provide one 

25 of the class of novel manufactured DNA sequences having 
at least one unique palindromic six base recognition 
site for restriction endonuclease cleavage at an inter- 
mediate position therein. A sequence so isolated 
may then be inserted in a different, "expression" 

30 vector and direct expression of the desired polypeptide 
by a microorganism which is the same as or differ- 
ent from that in which the assembly vector is amplified. 
In other preferred embodiments of the method: at 
least three different subunit DNA sequences are prepared 

35 in step (a) and serially inserted into said selected 

assembly vector in step (b) and the desired manufactured 
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DNA sequence obtained includes at least two unique 
palindromic six base recognition sites for restriction 
endonuclease cleavage at intermediate positions therein; 
the DNA sequence synthesized comprises an entire struc- 
5 tural gene coding for a biologically active polypeptide; 
and/ in the DNA sequence manufactured , the sequence 
of nucleotide bases includes one or more codons selected r 
from among alternative codons specifying the same 
amino acid, on the basis of preferential expression 
10 characteristics of the codon in said selected host 

microorganism. . . 

Novel products of the invention include 
manufactured, linear , double stranded DNA sequences 
of a length in excess of about 200 base pairs and 

15 coding for the expression of a predetermined continuous 
sequence of amino acids by a selected host microorganism 
transformed with a selected DNA vector including- the 
sequence, characterized by having at least one unique 
palindromic six base recognition site for restriction 

20 endonuclease cleavage at an intermediate position 
therein. Also included are polypeptide products of 
the expression by an organism of such manufactured 
sequences • 

Illustratively provided by the present inven- 
25 tion are novel manufactured genes coding for the syn- 
thesis of human immune interferon (IFN-y) and novel 
biologically functional analog polypeptides which 
differ from human immune interferon in terms of the 
identity and/or location of one or more amino acids • 
30 Also' provided are manufactured genes coding for synthe- 
sis of human leukocyte interferon of the F subtype 
("LelFN— F w or "IFN-aF") and analogs thereof, along 
with consensus human leukocyte interferons. 

DNA subunit sequences for use in practice 
35 of the methods of the invention are preferably synthe- 
sized from nucleotide bases according to the methods 
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disclosed in co-owned, concurrently-filed U.S. Patent 
Application Serial No. 375,493, by Yitzhak Stabinsky, 
entitled "Manufacture and Expression of Structural 
Genes" (Attorney's Docket No. 6250). Briefly summarized 
5 the general method comprises the steps of: 

(1) preparing two or more different, linear, 
duplex DNA strands, each duplex strand including a 
double stranded region of 12 or more selected complemen- 

10 tary base pairs and further including a top single 
stranded terminal sequence of from 3 to 7 selected 
bases at one end of the strand and/or a bottom single 
stranded terminal sequence of from 3 to 7 selected 
bases at the other end of the strand, each single 

15 stranded terminal sequence of each duplex DNA strand 
comprising the entire base complement of at most one 
single stranded terminal sequence of any other duplex 

DNA strand prepared; and 

(2) annealing each duplex DNA strand prepared 

20 in step CD to one or two different duplex strands 
prepared in step (1) having a complementary single 
stranded terminal sequence, thereby to form a single 
continuous double stranded DNA sequence which has 
a duplex region of at least 27 selected base pairs 

25 including at least 3 base pairs formed by complementary 
association of single stranded terminal sequences 
of duplex DNA strands prepared in step (1) and which 
has from 0 to 2 single stranded top or bottom terminal 

regions of from 3 to 7 bases. 

30 m the preferred general process for subunit 

manufacture, at least three different duplex DNA strands 
are prepared in step (1) and all strands so prepared 
are annealed concurrently in a single annealing reaction 
mixture to form a single continuous double stranded 

35 DNA sequence which has a duplex region of at least 
42 selected base pairs including at least two non- 
adjacent sets of 3 or more base pairs formed by coraple- 
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mentary association of single stranded terminal sequen- 
ces of duplex strands prepared in step (1) . 

The duplex DNA strand preparation step (1) 
of the preferred subunit manufacturing process prefer- 
5 ably comprises the steps of: 

(a) constructing first and second linear 
deoxyoligonucleotide segments having 15 or more bases 
in a selected linear sequence, the linear sequence 
of bases of the second segment comprising the total 

10 complement of the sequence of bases of the first segment 
except that at least one end of the second segment 
shall either include an additional linear sequence 
of from 3 to 7 selected bases beyond those fully comple- 
menting the first segment, or shall lack a linear 

15 sequence of from 3 to 7 bases complementary to a terminal 
sequence of the first segment, provided, however, 
that the second segment shall not have an additional 
sequence of bases or be lacking a sequence of bases 
at both of its ends? and, 

20 (b) combining the first and second segments 

under conditions conducive to complementary association 
between segments to form a linear, duplex DNA strand. 

The sequence of bases in the double stranded 
•DNA subunit, sequences formed preferably includes one 

25 or more triplet codons selected from among alternative 
codons specifying the same amino acid on the basis 
of preferential expression characteristics of the 
codon in a projected host microorganism, such as yeast 
cells or bacteria, especially E. coli bacteria. 

30 Also provided by the present invention are 

improvements in methods and materials for enhancing 
levels of expression of selected exogenous genes in 
E. coli host cells. Briefly stated, expression vectors 
are constructed to include selected DNA sequences 

35 upstream of polypeptide coding regions which selected 
sequences are duplicative of ribosome binding site 
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sequences extant in genomic E^ coli DNA associated 
with highly expressed endogenous polypeptides. A 
presently preferred selected sequence is duplicative 
of the ribosome binding site sequence associated with 
5 L coli expression of outer membrane protein P ("OMP-F"). 

Other aspects and advantages of the present 
invention will be apparent upon consideration of the 
following detailed description thereof. 

10 nETAILED D ESCRIPTION 

As employed herein,. 'the term "manuf actured" 
as applied to a DNA sequence or gene shall designate 
a product either totally chemically synthesized by 

15 assembly of nucleotide bases or derived from the biologi- 
cal replication of a product thus chemically synthe- 
sized. As such, the term is exclusive of products 
"synthesized" by cDNA methods or genomic cloning method- 
ologies which involve starting materials which are 

20 of biological origin. Table I below sets out abbrevia- 
tions employed herein to designate amino acids and 
includes lUPAC-recommended single letter designations. 



25 



30 



35 
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TABLE I 

Amino Acid 

5 Alanine 

Cysteine 

Aspartic acid 

Glutamic acid 

Phenylalanine 
10 Glycine 

Histidine 

Isoleucine 

Lysine 

Leucine 
15 Methionine 

Asparagine 

Proline 

Glutamine 

Arginine 
20 Serine 

Threonine 

Valine 

Tryptophan 

.Tyrosine 

25 

The following abbreviations shall be employed 
for nucleotide bases: A for adenine; G for guanine; 
T for thymine; U for uracil; and C for cytosine. 
For ease of understanding of the present invention, 

30 Table II and II below provide tabular correlations 

between the 64 alternate triplet nucleotide base codons 
of DNA.and the 20 amino acids and transcription termina 
tion ("stop") functions specified thereby. In order 
to determine the corresponding correlations for RNA, 

35 U is substituted for T in the tables. 



Abbreviation IUPAC Symbol 



V ^ ^ 

Ala 


A 


Cys 


c 


Asp 


D 


Glu 


E 


Phe 


F 


Gly 


G 


Hxs 


H 


lie 


I 


Lys 


K 


Leu 


L 


Met 


M 


Asn 


N 


Pro 


P 


Gin 


Q 


Arg 


R 


Ser 


S 


Thr 


T 


Val 


V 


Trp 


W 


Tyr 


y 
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TABLE II 






FIRST 
POSITION 




SECOND 


POSITION 




THIRD 
POSITION 


• 


T 


C 


A 


G 




T 


Phe 
Phe 
Leu 
Leu 


Ser 
Ser 
Ser 
Ser 


Tyr 
Tyr 
Stop 
Stop 


Cys 
Cys 
Stop 
Trp 


T 
C 
A 
G 


• 

C 


Leu 
Leu 
Leu 
Leu 


Pro 
Pro 
Pro 
Pro 


His . . 
His 
Gin 
Gin 


Arg 
Arg 
Arg 
Arg 


T 
C 
A 
G 


A 


He 
He 
lie 
Met 


Thr 
Thr 
Thr 
Thr 


Asn 
Asn 
Lys 

Lys • 


Ser 
Ser 
Arg 
Arg 


T 
C 
A 
G 


G 


Val 
Val 
Val 
Val 


Ala 
Ala 
Ala 
Ala 


Asp 
Asp 
Glu 
Glu 


Gly 
Gly 
Gly 
Gly 


T 
C 
A 

■ 

G 



25 



30 



35 
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TABLE III 



Amino Acid 



Specifying Codon(s) 



5 
— * 


(A) 


Alanine 


• GCT, 


GCC, 

• 


GCA, 


GCG 




CO 


Cysteine 


TGT, 


TGC 










AsDartic acid 


GAT, 


GAC 








(E) 


Glutamic acid 


GAA, 


GAG 








(F) 


P hen vial an ine 


TTT m 


TTC 








(G) 


Give ine 


GGT# 


GGCr 


GGA, 


GGG 






UIO UAVJ JLiXC 


CAT * 


CAC 

• * 








( I) 


Tsoleuc ine 


ATT, 


ATC r 


'ATA 






(K) 


Lysine 


AAA, 


AAG 










Leucine 


TTA r 


TTG, 


CTT, 


CTC 


15 


(M.) 


Methionine 


ATG 










(N) 


Asparagine 


AAT , 


AAC 








(P) 


Proline 


CCT, 


CCC f 


CCA, 


CCG 




(Q) 


Glutamine 


CAA, 


CAG 








(R) 


Arginine 


CGT, 


CGC, 


CGA, 


CGG 


20 


(S) 


Serine 


TCT, 


TCC, 


TCA, 


TCG 




(T) 


Threonine 


ACT/ 


ACC, 


ACA, 


ACG 




(V) 


Valine 


GTT, 


GTC GTA, GTG 




(W) 


Tryptophan 


TGG 










(Y) 


Tyrosine 


TAC, 


TAT 






25 


STOP 




TAA, 


TAG, 


TGA 





A "palindromic" recognition site for restric- 
tion endonuclease cleavage of double stranded DNA 
is one which displays "left-to-right and rigbt-to- 

30 left" symmetry between top and bottom base complements, 
i.e., where "readings" of complementary base sequences 
of the recognition site from 5 1 to 3' ends are identical 
Examples of palindromic six base recognition sites 
for restriction endonuclease cleavage include the 

35 sites for cleavage by Hindlll wherein top and bottom 

strands read from 5* to 3 1 as AAGCTT. A non-pal indromic 
six base restriction site is exemplified by the site 



REACT 
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for cleavage by EcoP15, the top strand of which repor- 
tedly reads CAGCAG. The bottom strand base complement, 
when read 5 1 to 3 1 is CTGCTG. Essentially by definition, 
restriction sites comprising odd numbers of bases 
5 (e.g., 5, 7) are non-palindromic. Certain endonucleases 
will cleave at variant forms of a site, which may 
be palindromic or not. For example, XhoII will recog- 
nize a site which teads (any purine) GATC (any pyriraidine) 
including the palindromic sequence AGATCT and the 

10 non-palindromic sequence GGATCT. Referring to the 

previously-noted n BRL Restriction Endonuclease Reference 
Chart," endonucleases recognizing six base palindromic - 
sites exclusively include BbrI, Chul, Hinl73, Hin91R, 
Hinblll, Hinblll, Hindlll, HinflX, Hsul, Bglll, StuI, 

15 Rrul, Clal, Avalll, PvuII, Smal , Xmal, EccI, SacII, 
Sbol, SbrI, Shyl, Sstll, Tgll, Avrll, Pvul, RshI, 
Rspl, Xnil, Xorll, Xmalll, Blul, Msil, Scul, Sexl, 
Sgol, Slal, Slul, Spal , Xhol, Xpal, Bcel70, Bsul247, 
PstI, SalPI, Xmall, Xorl, EcoRI, Rsh630I, SacI, SstI, 

20 SphI, BaraHI, BamKI, BamNI, BamPl, BstI, Kpnl, Sail, 
XamI, Hpal, Xbal, AtuCI, Bell, Cpel, SstIV, AosI, 
MstI, Ball, AsuII, and Mlal. Endonucleases which 
recognize only non-palindromic six base sequences 
exclusively include Tthlllll, EcoPl5, Aval, and Avrl. 

25 Endonucleases recognizing both palindromic and non- 
palindromic six base sequences include Hael, HgiAI, 
Acyl, Aos II, AsuIII, Accl, ChuII, Hindi, Hindu, 
MnnI, XhoII, Haell, HinHI, Ngol, and EcoRI 1 . 

Upon determination of the structure of a 

30 desired polypeptide to be produced , practice of the 
present invention involves: preparation of two or 
more different specific, continuous double stranded 
DNA subunit sequences of 100 or more base pairs in 
length and having terminal portions of the proper 

35 configuration; serial insertion of subunits into a 

selected assembly vector with intermediate amplifica- 
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tion of the hybrid vectors in a selected host organism; 
use of the assembly vector (or an alternate , selected 
"expression" vector including the DNA sequence which 
has been manufactured from the subunits) to transform * 
5 a suitable, selected host; and, isolating polypeptide 

sequences expressed in the host organism. In its * 
most efficient forms , practice of the invention involves 
using the same vector for assembly of the manuf actured 
sequence and for large scale expression of the polypep- 
10 tide. Similarly, the host microorganism employed 

for expression will ordinarily be the same as employed 
for amplifications performed during the subunit assembly 
process . 

The manufactured DNA sequence may be provided 
15 with a promoter/regulator region for autonomous control 
of expression or may be incorporated into a vector 
in a manner providing for control of expression by 
a promoter/regulator sequence extant in the vector. 
Manufactured DNA sequences of the invention may suitably 
20 be incorporated into existing plasmid-borne genes 

(e.g., B-galactosidase) to form fusion genes coding 
for fusion polypeptide products including the desired 
amino acid sequences coded for by the' manufactured 
DNA sequences. 

25 in practice of the invention in its preferred 

forms, polypeptides produced may vary in size from 
about 65 or 70 amino acids up to about 200 or more 
amino acids. High levels of expression of the desired 
polypeptide by selected transformed host organisms 

30 is facilitated through the manufacture of DNA sequences 
which include one or more alternative codons which 
are preferentially expressed by the host. 

Manufacture of double stranded subunit DNA 
sequences of 100 to 200 base pairs in length may proceed 

35 according to prior art assembly methods previously 
referred to, but is preferably accomplished by means 

QMPI 
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of the rapid and efficient procedures disclosed in 
the aforementioned U.S. Application S.N. 375,493 by 
Stabinsky and used in certain of the following 
examples of actual practice of the present invention. 
5 Briefly put, these procedures involve the assembly 
from deoxyoligonucleotides of two or more different, 
linear, duplex DNA strands each including a relatively 
long double stranded region along with a relatively 
short single stranded region on one or both opposing 
10 ends of the double strand. The double stranded regions 

are designed to include codons needed to specify assembly 

♦ » - 

of an initial, or terminal or 'intermediate portion 
of the total amino acid sequence of the desired polypep- 
tide. Where possible, alternative codons preferentially 

15 expressed by a projected host (e.g., E. coli ) are 
employed. Depending on the relative position to be 
assumed in the finally assembled subunit DNA sequence, 
the single stranded region (s) of the duplex strands 
will include a sequence of bases which, when complemented 

20 by bases of other duplex strands, also provide codons 
specifying amino acids within the desired polypeptide 
sequence • 

. Duplex strands formed according to this 
procedure are then enzymatically annealed to the one 

25 or two different duplex strands having complementary 
short, single stranded regions to form a desired con- 
tinuous double stranded subunit DNA sequence which 
codes for the desired polypeptide fragment. 

High efficiencies and rapidity in total 

30 sequence assembly are augmented in such procedures 
by performing a single annealing reaction involving 
three or more duplex strands, the short, single stranded 
regions of which constitute the base complement of 
at most one other single stranded region of any other 

35 duplex strand. Providing all duplex strands formed 
with short single stranded regions which uniquely 
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complement only one of the single stranded regions 
of any other duplex is accomplished by alternative 
codon selection within the context of genetic code 
redundancy, and preferably also in the context of 
5 codon preferences of the projected host organism • 

The following description of the manufacture 
of a hypothetical long DNA sequence coding for a hypo- 
thetical polypeptide will serve to graphically illus- 
trate practice of the invention, especially in the 

10 context of formation of proper terminal sequences 
on subunit DNA sequences. 

A biologically active -polypeptide of interest 
is isolated and its amino acids are sequenced to reveal 
a constitution of 100 amino acid residues in a given 

15 continuous sequence. Pormation of a manufactured 

gene for microbial expression of the polypeptide will 
thus require assembly of at least 300 base pairs for 
insertion into a selected viral or circular plasmid 
DNA vector to be used for transformation of a selected 

20 host organism. 

A preliminary consideration in construction 
of the manufactured gene is the identity of the projected 
microbial host, because foreknowledge of the host 
allows for codon selection in the context of codon 

25 preferences of the host species. For purposes of 

this discussion, the selection of an E. coli bacterial 

host is posited. 

A second consideration in construction of 
the manufactured gene is the identity of the projected 

30 DNA vector employed in the assembly process. Selection 
of a suitable vector is based on existing knowledge 
of sites for cleavage of the vector by restriction 
endonuclease enzymes. More particularly, the assembly 
vector is selected on the basis of including DNA seq;uen- 

35 ces providing endonuclease cleavage sites which will 

permit easy insertion of the subunits. In this regard. 
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the assembly vector selected preferably has at least 
two restriction sites which occur only once (i.e., 
are "unique") in the vector prior to performance of 
any subunit insertion processes. For the purposes 
5 of this description, the selection of a hypothetical 
circular DNA plasmid pBR 3000 having a single BcoRI 
restriction site, i.e., ^otaAC^' and a Sin9le PVUl1 

restriction site, i.e., I2tcGA<>' is P° sited - 
10 The amino acid sequence of the desired polypep- 

tide is then analyzed in the context of determining 
availability of alternate codons for given amino acids 
(preferably in the context of codon preferences of 
the projected E. coli host) . With this information 
15 in hand, two subunit DNA sequences are designed, prefer- 
ably having a length on the order of about 150 base 
pairs -- each coding for approximately one-half of 
the total amino acid sequences of the desired polypep- 
tide. For purposes of this description, the two sub- 
20 units manufactured will be referred to as "A" and 
"B". 

The methods of the present invention as 
applied to two such subunits, generally call for: 
insertion of one of the subunits into the assembly 

25 vector; amplification of the hybrid vector formed; 

and insertion of the second subunit to form a second 
hybrid including the assembled subunits in the proper 
sequence. Because the method involves joining the 
two subunits together in a manner permitting the joined 

30 ends to provide a continuous preselected sequence 

of bases coding for a continuous preselected sequence 
of amino acids, there exist certain requirements concern 
ing the identity and sequence of the bases which make 
up the terminal regions of the manufactured subunits 

35 which will be joined to another subunit. Because 

the method calls for joining subunits to the assembly 
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vector / there exist other requirements concerning 
the identity and sequence of the bases which make 
up those terminal regions of the manufactured subunits 
which will be joined to the assembly vector. Because 
5 the subunits are serially/ . rather than concurrently, 
inserted into the assembly vector (and because the 
methods are most beneficially practiced when the subunits 
can be selectively excised from assembled form to 
allow for alterations in selected base sequences therein) 

10 still further requirements exist concerning the identity 
of the bases in terminal regions of subunits manufactured 
For ease of understanding in- the- following discussion 
of terminal region characteristics, the opposing ter- 
minal regions of subunits A and B are respectively 

15 referred to as A-l and A-2, and B-l and B-2, viz: 

B-2 B-l A-2 A-l 

B A 



20 Assume, that an assembly strategy is developed 

wherein subunit A is to be inserted into pBR3000 first, 
with terminal region A-l to be ligated to the vector 
at the EcoRI restriction site. In the simplest case, 
. the terminal region is simply provided with an EcoRI 

25 "sticky end", i.e., a single strand of four bases 

(-AATT- or -TTAA-) which will complement a single 

stranded sequence formed upon EcoRI digestion of pBR3000. 

This will allow ligation of terminal region A-l to 

the vector upon treatment with ligase enzyme. Unless 

30 the single strand at the end of terminal region A-l 

- 5 ' -G— % 

is preceded by an appropriate base pair (e.g., ^ i -CTTAA- 

the entire recognition site will not be reconstituted 
upon ligation to the vector. Whether or not the EcoRI 
recognition site is reconstituted upon ligation (i.e., 
35 whether or not there will be 0 or 1 EcoRI sites remain- 
ing after insertion of subunit A into the vector) 
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is at the option of the designer of the strategy. 
Alternatively, one may construct the terminal region 
A-l of subunit A to include a complete set of base 
pairs providing a recognition site for some other endo- 
5 nuclease, hypothetically ♦ designated w XXX n , and then 
* add on portions of the EcoRI recognition site as above 

to provide an EcoRI "linker". To be of practical 
use in excising subunit A from an assembled sequence/ 
the "XXX" site should not appear elsewhere in the 
10 hybrid plasmid formed upon insertion. The requirement 
for construction of terminal region A-l is, therefore, 
that it comprise a portion (i.e.*; all or part) of 
a base sequence which provides a recognition site 
for cleavage by a restriction endonuclease, which 
15 recognition site is entirely present either once or 
not at all in the assembly vector upon insertion of 

the subunit. 

Assume that terminal region B-2 of subunit 
B is also to be joined to the assembly vector (e.g., 

20 at the single recognition site for PvuII cleavage 

present on pBR3000) . The requirements for construction 
of terminal region B-2 are the same as for construction 
of A-l, except that the second endonuclease enzyme 
•in reference to which the construction of B-2 is made 

25 must be different from that with respect to which 

the construction of A-l is made. If recognition sites 
are the same, one will not be able to separately excise 
segments A and B from the fully assembled sequence. 

The above assumptions require, then, that 
* 30 terminal region A-2 is to be ligated to terminal region 
B-l in the final pBR3000 hybrid. Either the terminal 
region A-2 or the terminal region B-l is constructed 
to comprise a portion of a (preferably palindromic 
six base) recognition site for restriction endonuclease 

35 cleavage by hypothetical third endonuclease rt YYY n 

which recognition site will be entirely present once 
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and only once in the expression vector upon insertion 
of , all subunits thereinto, i.e., at am intermediate 
position in the assemblage of subunits* There exist 
a number of strategies for obtaining this result. 
5 In one alternative strategy, the entire- recognition 
site of "YYY" is contained in terminal region A-2 
and the region additionally includes the one or more 
portions of other recognition sites for endonuclease 
cleavage needed to (1) complete the insertion of subunit 

10 A into the assembly vector for amplification purposes, 
and (2) allow for subsequent joining of subunit' A 
to subunit B. In this case, terminal region B-l would 
have at its end only the bases necessary to link it 
to terminal region A-2. In another alternative, the 

15 entire "YYY" recognition site is included in terminal 

region B-l and B-l further includes at its end a portion 
of a recognition site for endonuclease cleavage which 
is useful for joining subunit A to subunit B. 



20 B-l may contain at its end a portion of the "YYY" 
recognition site. Terminal region A-2 would then 
contain the entire "YYY" recognition site plus, at 
its end, a suitable "linker" for joining A-2 to the 
• assembly vector prior to amplification of subunit 

25 A (e.g., a PvuII "sticky end"). After amplification 
of the hybrid containing subunit A, the hybrid would 
be cleaved with "yyy" (leaving a sticky-ended portion 
of the "YYY" recognition site exposed on the end of 
A-2) and subunit B could be inserted with its B-l 

30 terminal region joined with the end of terminal region 
A-2 to reconstitute the entire "YYY" recognition site. 
The requirement for construction of the terminal regions 
of all segments (other than A-l and B-2) is that one 
or the other or both (i.e., "at least half") comprise 

35 a portion (i.e., include all or part) of a recognition 
site for third restriction endonuclease cleavage, 



As another alternative, terminal region 
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which recognition site is entirely present once and 
only once (i.e., is "unique") in said assembly vector 
after insertion of all subunits thereinto. To generate 
a member of the class of novel DBA sequences of the 
5 invention, the recognition site of the third endonuclease 
should be a six base palindromic recognition site. 

While a subunit "terminal region" as referred 
to above could be considered to extend from the subunit 
end fully halfway along the subunit to its center, 

10 as a practical matter the constructions noted would 
ordinarily be performed in the final 10 or 20 bases. 
Similarly, while the unique "intermediate" recognition 
site in the two subunit assemblage may be up to three 
times closer to one end of the manufactured sequence 

15 than it is to the other, it will ordinarily be located 
near the center of the sequence. If, in the above 
description, a synthetic plan was generated calling 
for preparation of three subunits to be joined, the 
manufactured gene would include two unique restriction 

20 enzyme cleavage sites in intermediate positions at 
least one of which will have a palindromic six base 
recognition site in the class of new DNA sequences 

of the invention. 

The significant advantages of the above- 

25 described process are manifest. Because the manufac- 
tured gene now includes one or more unique restriction 
endonuclease cleavage sites at intermediate positions 
along its length, modifications in the codon sequence 
of the two subunits joined at the cleavage site may 

30 be effected with great facility and without the need 
to re-synthesize the entire manufactured gene. 

Following are illustrative examples of the 
actual practice of the invention in formation of manu- 
factured genes capable of directing the synthesis 

35 of: human immune interferon (IFN Y ) and analogs thereof; 
human leukocyte interferon of the F subtype (iNF-aF) 
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and analogs thereof; and, multiple consensus leukocyte 
Interferons which, due to homology to IFN-aF can be 
named as IFN-aF analogs. It will be apparent from 
these examples that the gene manufacturing method- 
5 ology of the present invention provides an overall 
synthetic strategy for the truly rapid, efficient 
synthesis and expression of genes of a length in excess 
of 200 base pairs within a highly flexible framework 
allowing for variations in the structures of products 
10 to be expressed which has not heretofore been available 

to investigators practicing recombinant DNA techniques. 

• - 

EXAMPLE 1 

15 in the procedure for construction of synthetic 

genes for expression of human IFNy a first selection 
made was the choice of E. coli as a microbial host 
for eventual expression of the desired polypeptides. 
Thereafter, codon selection procedures were carried 

20 out in the context of E. coli codon preferences enumer- 
ated in the Grantham publications , supra . A second 
selection made was the choice of pBR322 as an expression 
vector and, significantly, as the assembly vector 
.to be employed in amplification of subunit sequences. 

25 In regard to the latter factor, the plasmid was selected 
with the knowledge that it included single BamHI, 
Hindlll, and Sail restriction sites. With these restric 
tion sites and the known sequence of amino acids in 
human immune interferon in mind, a general plan for 

30 formation of three "major" subunit DNA sequences (IF-3, 
IF-2 and IF-1) and one "minor" subunit DNA sequence 
(IF-4) was evolved. This plan is illustrated by Table 
IV below. 
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The "minor" sequence (IF-4) is seen to include 
codons for the 4th through 1st (5«-TGT TAC TGC CAG) 
amino acids and an ATG codon for an initiating methio- 
nine [Met -1 ], in this construction, it also includes 
5 additional bases to provide a portion of a control 
involved in an expression vector assembly from pBR 
322 as described infra . 

Alternative form of subunit IFN-1 for use 

14 0 n T _ T 

in synthesis of a manufactured gene for [Arg ] iFNy 
10 included the codon 5'-CGT in place of 5' -CAG (for 

[Gin 140 ]), at the codon site specifying the 140th amino 
acid. 

The codon sequence plan for the top strand 
of the polypeptide-specifying portion total DNA sequence 
15 synthesized was as follows: 

5 ' — TGT— TAC— TGC-CAG— GAT— CCG— TAC— GTT— AAG— GAA— GCA— GAA- 
AAC— CTG— AAA— AAA-TAC— TTC— AAC— GCA— GGC— CAC— TCC— GAC— GTA— 
GCT-GAT-AAC-GGC-ACC— CTG-TTC— CTG-GGT-ATC-CTA-AAA-AAC- 

20 TGG-AAA-GAG-GAA-TCC-GAC-CTG-AAG-ATC-ATG-CAG-TCT-CAA- 
ATT-GTA-AGC-TTC-TAC-TTC-AAA-CTG-TTC-AAG-AAC-TTC-AAA- 
GAC-GAT-CAA-TCC-ATC-CAG-AAG-AGC-GTA-GAA-ACT-ATT— AAG- 
GAG-GAC-ATG-AAC-GTA-AAA-TCC-TTT-AAC-AGC-AAC-AAG-AAG- 
AAA— CGC-GAT-GAC— TTC-GAG-AAA— CTG— ACT— AAC— TAC— TCT— GTT— 

25 ACA-GAT-CTG-AAC-GTG-CAG-CGT-AAA-GCT-ATT-CAC-GAA-CTG- 
ATC-CAA— GTT— ATG— GCT— GAA-CTG— TCT— CCT— GCG-GCA— AAG— ACT— 
GGC-AAA-CGC-AAG-CGT-AGC-CAG— ATG-CTG— TTT-CAG- [or CGT] - 

CGT-CGC-CGT-GCT-TCT-CAG . 

30 in the above sequence, the control sequence 

bases and the initial methionine-specif ying codon 
is not illustrated, nor are termination sequences 
or sequences providing a terminal Sail restriction 
site. Vertical lines separate top strand portions 

35 attributable to each of the subunit sequences. 
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The following example illustrates a preferred 
general procedure for preparation of deoxyoligonucleo- 
tides for use in the manufacture of DNA sequences 
of the invention. 

EXAMPLE 2 

Oligonucleotide fragments were synthesized 
using a four-step procedure and several intermediate 

10 washes. Polymer bound dimethoxytrityl protected nucleo- 
side in a sintered glass funnel was first stripped 
of its 5 '-protecting group (dimethoxytrityl) using 
3% trichloroacetic acid in dichlorome thane for 1-1/2 
minutes. The polymer was then washed with methanol/ 

15 tetrahydrof uran and acetonitrile. The washed polymer 
was then rinsed with dry acetonitrile, placed under 
argon and then treated in the condensation step as 
follows. 0.5 ml of a solution of 10 mg tetrazole 
in acetonitile was added to the reaction vessel contain- 

20 ing polymer. Then 0.5 ml of 30 mg protected nucleoside 
phosphor amid ite in acetronitrile was added. This 
reaction was agitated and allowed to react for 2 "minutes. 
The reactants were then removed by suction and the 
polymer rinsed with acetonitrile. This was followed 

25 by the oxidation step wherein 1 ml of a solution contain- 
ing 0.1 molar I 2 in 2-6-lutidine/H 2 0/THF, 1:2:2, was 
reacted with the polymer bound oligonucleotide chain 
for 2 minutes. Following a THF rinse capping was 
done using a solution of dimethylaminopyridine (6.5 g 

30 in 100 ml THF) and acetic anhydride in the proportion 
4rl for 2 minutes. This was followed by a methanol 
rinse and a THF rinse. Then the cycle began again 
with a trichloroacetic acid in CH 2 C1 2 treatment. 
The cycle was repeated until the desired oligonucleotide 

35 sequence was obtained . 
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The final oligonucleotide chain was treated 



wi 



th thiophenol dioxane, trie thy lamine 1:2:2 



for 



45 minutes at room 



temperature. Then, after rins 



ing 



with dioxane, methanol and diethylether , the oligonucleo- 
tide was cleaved from the polymer with concentrated 
ammonium hydroxide at room temperature. After decanting 
the solution from the polymer, the concentrated ammonium 
hydroxide solution was heated at 60°C for 16 hours 

in a sealed tube. 

Each oligonucleotide solution was then extrac- 
ted four. times with 1-butanol. The solution was loaded 



into a 



20% polyacrylamide 7 molar urea electrophoresis 



gel and, after running, the appropriate product DNA 

band was isolated. 

Subunits were then assembled from deoxyoligo- 

nucleotides according to the general procedure for 

assembly of subunit IF-1 



Following 



the isolation of the desired 14 



DNA 



segments, subunit IF-1 was constructed in the 



20 following manner : 

1. One nanomole of each of the DNA fragments 

excluding segment 13 and segment 2 which contain 5' 
cohesive ends, were subjected to 5 • -phosphorylation ; 

2. The complementary strands of DNA, segments 



25 13 and 14, 11 and 



12, 9 and 10, 7 and 8, 5 and 6 



3 and 4 and 1 and 2 were 



combined together, warmed 



to 90° and slowly cooled to 25 



were 



3. The resulting annealed pairs of DNA 
combined sequentially and warmed to 37" and slowly 



30 cooled to 25 



the final 



4. The concentration of ATP and DTT in 

tube containing segments 1 thru 14 was adjusted 



to 



150 jiM and 18 raM respectively. Twenty units of 



T-4 DNA ligase was added to this solution and the 
35 reaction was incubated at 4° for 18 hrs; 
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5. The resulting crude product was heated 
to 90° for 2 min. and subjected to gel filtration 
on Sephadex G50/40 using 10 mM triethyl ammonium bicar- 
bonate as the elueht;** 
5 6.* The "desired product was purified r follow- 

ing 5 1 phosphorylation, using an 8% polyacrylamide-TBE 
gel. 

Subunits IF-2, IF-3 and IF-4 were constructed 
in a similar manner* 

10 The following example relates to: assembly 

of the complete human immune interferon gene from 
subunits IF-1, IF-2 r IF-3/ and IF-4; procedures for 
the growing , under appropriate nutrient conditions, 
of transformed E. coli cells, the isolation of human 

15 immune interferon from the cells, and the testing 
of biological activity of interferon so isolated. 

EXAMPLE 3 

20 The major steps in the general procedure 

for assembly of the complete human IFNy specifying 
genes from subunits IF-1, IF-2, and IF-3 are illustrated 

.in Figure 1. 

The 136 base pair subunit IF-1 was electro- 

25 eluted from the gel, ethanol precipitated and resuspended 
in water at a concentration of 0.05 pmol/iil. Plasmid 
pBR322 (2.0 pmol) was digested with EcoRI and Sail, 
treated with phosphatase, phenol extracted, ethanol 
precipitated, and resuspended in water at a concentra- 

30 tion of 0.1 pmol/|il. Ligation was carried out with 
0.1 pmol of the plasmid and 0.2 pmol of subunit IF-1, 
using T-4 DNA ligase to form hybrid plasmid pINTl. 
E. coli were transformed and multiple copies of pINTl 

were isolated therefrom. 
35 The above procedure was repeated for purposes 

of inserting the 153 base pair subunit IF-2 to form 
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10 



P INF2 except that the plasmid was digested with EcoRI 
and Bglll. The 153 base pair IF-3 subunit was similarly 
inserted into P INT2 during manufacture of pINT3 except 
that EcoRI and Hind III were used to digest the plasmid. 

An IF-4 subunit was employed in the construc- 
tion of the final expression vector as follows: Plasmid 
PWI was purchased from Stanford University, Palo 
Alto, California, and digested with PvuII. Using 
standard procedures, an EcoRI recognition site was 
inserted in the plasmid at a PvuII site. Copies of 
this hybrid were then digested with EcoRI and Hpal 
to provide a 245 base pair sequence including a portion 
of the trp promoter/operator region. By standard proce- 
dures, IF-4 was added to the Hpal site in order to 
15 incorporate the remaining 37 base pairs of the complete 
trp translational initiation signal and bases providing 
codons for the initial four amino acids of immune 
interferon (Cys-Tyr-Cys-Gln) . The resulting assembly 
was then inserted into pINT3 which had been digested 
with EcoRI and BamHI to yield a plasmid designated 

pINTY-trpI7 . 

E. coli cells containing pINT Y -trpl7 were 

grown on K media in the absence of tryptophan to 
an O.D. 6Q0 of 1. indoleacrylic acid was added at 
a concentration of 20 Rg per ml and the cells were 
cultured for an additional 2 hours at 37°C. Cells 
were harvested by centrif ugation and the cell pellet 
was resuspended in fetal calf serum buffered with 
HEPES ( P H 8.0). Cells were lysed by one passage through 
a French press at 10,000 psi. The cell lysate was 
cleared of debris by centrif ugation and the supernatant 
was assayed for antiviral . activity by the CPE assay 
[-The interferon System" Stewart, ed., Springer-Verlag , 
N.Y., N.Y. (1981)1. The isolated product of expression 
35 was designated Y~l* 
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This example relates to a modification in 
the DNA sequence of plasmid pINTY-trpl7 which facili- 
tated the use of the vector. in the trp promoter-con- 

trolled expression. of . structural .genes coding for, 

■ * • * + 

5 e.g., analogs of IFN-y and . IFN-aF . : . 

EXAMPLE 4 

Segment IF-4, as previously noted , had been 

10 constructed to include bases coding for an initial 
methionine and the first four amino acids of IFN-y 
as well -as 37 base pairs (commencing at its 5 f end 
with a Hpal blunt end) which completed at the 3* end 
of a trp promoter/operator sequence , including a Shine 

15 Delgarno ribosome binding sequence. It was clear that 
manipulations involving sequences coding IFN-y analogs 
and for polypeptides other than IFN-y would be facili- 
tated if a restriction site 3' to the entire trp prom- 
oter/operator region could be established. By way 

20 of illustration , sequences corresponding to IF-4 for 
other genes could then be constructed without having 
to reconstruct the entire 37 base pairs needed to 
reconstitute the trp promoter/operator and would only 
require bases at the 5* end such as would facilitate 

25 insertion in the proper reading frame with the complete 
promoter/operator . 

Consistent with this goal, sequence IF-4 
was reconstructed to incorporate an Xbal restriction 
site 3' to the base pairs completing the trp promoter/- 

30 operator. The construction is shown in Table V below. 
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AA CTA GTA CGC AAG TTC ACG TAA AAA GGG 



i 



TT GAT CAT GCG TTC AAG TGC ATT | TTT CCC 



Xbal -112 3 4 
Met Cys Tyr Cys Gin 



TAT 



CTA GAA ATG TGT TAC TGC CA 



3 



ATA GAT~"cjrT TAC ACA ATG ACG GTC CTAG| 



15 This variant form of segment IF-4 was inserted 

in pINTY-trpI7 (digested with Hpal and BamHI) to gene- 
rate plasmid plNT T -TXb4 from which the iFN-y-specif ying 
gene could be deleted by digestion with Xbal and Sail 
and the entire trp promoter/operator would remain 

20 on the large fragment. 

The following example relates to construction 
of structural analogs of IFN-y whose polypeptide struc- 
ture differs from .that of IFN-y in terms of the the 
identity of location of one or more amino acids. 



EXAMPLE 5 



A first class of analogs of IFN-y was formed 
which included a lysine residue at position 81 in 
place of asparagine. The single base sequence change 
needed to generate this analog was in subunit IF-2 
of Table IV in segments 35 and 36. The asparagine- 
specifying codon, AAC, was replaced by the lysine- 
specifying codon, AAG. The isolated product of expres- 
35 sion of such a modified DNA sequence [Lys ]IFN- Y / 
was designated y-10. 
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Another class of IFNy analogs consists of 
polypeptides wherein one or more potential glycosilation 
sites present in the amino acid sequence are deleted. 
More particularly, these consist of [Arg 140 ]IFNy or 
[Gin 140 ] IFNy wherein the polypeptide sequence fails 



to include one or more naturally occurring sequences, 
[(Asn or Gin) - (ANY) - (Ser or Thr)], which are known 
to provide sites for glycosilation of the polypeptide. 
One such sequence in IFNy spans positions 28 through 

10 30, (Asn-Gly-Thr ) , another spans positions 101 through 
103 (Asn-Tyr-Ser ) . Preparation of an analog according 
to the invention with a modification at positions 
28-30 involved cleavage of plasmid containing all 
four IFN-y subunits with BamHI and Hindlll to delete 

15 subunit IF-3, followed by insertion of a variant of 
subunit IF-3 wherein the AAC codon for asparagine 
therein is replaced by the codon. for glutamine f CAG. 
(Such replacement is effected by modification of deoxy- 
oligonucleotide segment 37 to include CAG rather than 

20 AAC and of segment 38 to include GTC rather than TTG. 
See Table IV.) The isolated product of expression 

28 

of such a modified DNA sequence, [Gin ] IFN-y, was 
designated y-12. Polypeptide analogs of this type 
would likely not be glycosilated if expressed in yeast 

25 cells. Polypeptide analogs as so produced are not 

expected to differ appreciably from naturally-occurring 
IFNy in terms of reactivity with antibodies to the 
natural form, or in .duration of antiproliferative 
or immunomodulatory pharmacological effects, but may 

30 display enhanced potency of pharmacological activity 

in one or more manner. 

Other classes of IFNy analogs consists of 

39 

polypeptides wherein the [Trp ] residue is replaced 
39 

by [Phe ] , and/or wherein one or more of the methionine 
35 residues at amino acid positions 48, 80, 120 and 137 
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are replaced by, e.g., leucine, and/or wherein cysteines 
at amino acid positions 1 and 3 are replaced by, e.g., 
serine or are completely eliminated. These last-men- 
tioned analogs may be more easily isolated upon micro- 
5 bial expression because they lack the capacity for 

formation of intermolecular disulfide bridge formation. 

Replacement of tryptophane with phenylalanine 
at position 39 required substitution for a TGG codon 
in subunit IF-3 with TTC (although TTT could also 

10 have been used) , effected by modification of the deoxy- 
oligonucleotide segment 33 (TGG to TTC) and overlapping 
segment 36 - (TGA to TAC) used to manufacture IF-3. 
[Phe 39 , Lys 81 JIFN-y , the isolated product of expression 
of such a modified DNA sequence (which also included 

15 the above-noted replacement of asparagine by lysine 
at position 81) was designated y-S+ 

in a like manner, replacement of one or 
more methionines at positions 48, 80, 120 r and 137, 
respectively, involves alteration of subunit IF-3 

20 (with reconstruction of deoxyoligonucleotides 31, 

32 and 34), subunit IF-2 (with reconstruction of deoxy- 
oligonucleotide segments 21 and 22) ; and subunit IF-1 
(with reconstruction of deoxyoligonucleotide segments 
.7 and 10 and/or 3 and 4). An analog of IFN-y wherein 

25 threonine replaced methionine at position 48 was obtained 
by modification of segment 31 in subunit IF-3 to delete 
the methionine-specifying codon ATG and replace it 
with an ACT codon. Alterations in segments 34 (TAC 
to TGA) were also needed to effect this change. 

30 Lys 81 ] IFN-y, the isolated product of expression of 

such a modified DNA sequence (also including a lysine- 
specifying codon at position 81) was designated y-6. 

Replacement or deletions of cysteines at 
positions 1 and 3 involves only alteration of subunit 

35 IF-4. As a first example, modifications in construction 
of subunit IF-4 to replace both of the cysteine-specif y- 
ing codons at positions 1 and 3 (TGT and TGC, respec- 
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30 



tively) with the serine-specifying codon, TCT, required 
reconstruction of only 2 segments (see e and f of 



Table IV) . 



[Ser x , Ser J , Lys 81 I IFN-y , the isolated 



81 



product of expression of the thus modified [Lys ] IFN-y 
5 DNA sequence, was designated y-2. As another example, 
[Lys 1 , Lys 2 , Gin 3 , 



, Lys ] IFN-y, designated y-3, was 
obtained as an expression product of a modified construc- 
tion of subunit IF-4 wherein codons AAA, AAA, and 
CAA respectively replaced TTG, TAC and TGC. Finally, 
10 [des-Cys 1 , des-Tyr 2 , des-Cys 3 , Lys 81 ] IFN-y, designated 
Y-4, was obtained by means of modification of subunit 

• IP-4 sections to ^t-V ' ia the amin ° acid s P ecify " 

ing region. It should be noted that the above modifica- 
tions in the initial amino acid coding regions of 
15 the gene were greatly facilitated by the construction 
of plNTy-TXb4 in Example 4 which meant that only short 



sequences 



with Xbal and BamHI sticky ends needed to 



be constructed to complete the amino terminal protein 
coding sequence and link the gene to the complete 

20 trp promoter. 

Among other classes of IFN-y analog polypep- 
tide provided by the present invention are those includ- 
ing polypeptides which differ from IFN-y in terms 
of amino acids traditionally held to be involved in 

25 secondary and tertiary configuration of polypeptides. 



As an example, provision of a cysteine residue at 
an intermediate position in the IFN-y polypeptide 
may generate a species of polypeptide structurally 
facilitative of formation of intramolecular disulfide 
bridges between amino terminal and intermediate cysteine 
residues such as found in IFN-o. Further, insertion 
or deletion of prolines in polypeptides according 
to the invention may alter linear and bending configura- 
tions with corresponding effects on biological activity. 



35 [Lys 81 , Cys""] IFN-y, 



95 



desigated y-9 , was isolated upon 

5 ' -TCG-3 ' 



expression 



of a DNA sequence fashioned with 3t_AGC-5 
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35 



S'-TTC-S' . ± sections 17 and 18 of subunit 
replacing ^i_aag-5* se 95 
IF-2. A DNA sequence specifying [Cys* 1IFN-Y (to 
be designated Y -ll)' is being constructed by the same^ 
general procedure. • Likewise,' a gene coding for -[Cys • 
Pro 104 HFN-Y is under construction with the threonxne- 
specifying codon ACA (section 15 of IF-2) being replaced 
by the proline-specifying codon CCA. 

[Glu 5 ]IFN-y, to be designated y-13, will 
result from modification of section 43 in subunit 
IF-3 to include the glutamate codon, GAA, rather than 
the aspartic acid specifying codon, GAT. Because 
such a change would no longer peftait the presence 
of a BamHI recognition site at that locus, subunit 
IF-3 will likely need to be constructed as a composite 
15 subunit with the amino acid specifying portions of 
subunit IF-4, leaving no restriction site between 
Xbal and- Hindlll in the assembled gene. This analog 
of IFN— y is expected to be less acid labile than the 
naturally-occurring form. 

The above analogs having the above-noted 
tryptophane and/or methionine and/or cysteine ^P lace " 
menls are not expected to differ from naturally-occurring 
IFNv in terms of reactivity with antibodies to the 
■natural form or in potency of antiproliferative or 
immunomodulatory effect but are expected to have enhan- 
ced duration of pharmacological effects. 

Still another class of analogs consists 
of polypeptides of a "hybrid" or "fused" type which 
include one or more additional amino acids at th e 
end of the- prescribed sequence. These would be expressed 
by DNA sequences formed by the addition, to the entire 
sequence coding for IFN Y , of another manufactured 
DNA sequence, e.g., one of the subunits coding for 
a sequence of polypeptides peculiar to Le IFN -Con, 
described infra. The polypeptide expressed is expected 
to retain at least some of the antibody reactivity 
of naturally-occurring IFN Y and to display some degree 
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of the antibody reactivity of LeIFN. Its pharmacologi- 
cal activities are expected to be superior to naturally- 
. occurring IFN-y &°th in terms of potency and duration, 
of action. 

5 Table VI, below, sets forth the results 

of studies of antiviral activity of IFN-y prepared 
according to the invention along with that of certain 
of the analogs tested. Relative antiviral activity 
was assayed in human HeLa cells infected with encephalo- 
10 myocarditis virus (EMCV) per unit binding to a mono- 
clonal antibody to IFN-y as determined in an immunoab- 



TABLE VI 

Relative Antiviral 
Interferon Activity 

y-1 1-0° 
y _4 0.60 

0.10 



y-5 

20 - Y _ 6 0.06 

0.51 



y-10 



The following example relates to modifications 
in the polypeptide coding region of the DNA sequences 
25 of the previous examples which serve to enhance the 
expression of desired products. 



EXAMPLE 6 



Preliminary analyses performed on the polypep- 
tide products of microbial expression of manufactured 
DNA sequences coding for IFN-y and analogs of IFN-y 
revealed that two major proteins were produced in 
approximately equal quantities — a 17K form corresponding 
35 to the complete 146 amino acid sequence and a 12K 

form corresponding to an interferon fragment missing 
about 50 amino acids of the amino terminal. Review 
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of codon usage in the manufactured gene revealed the 
likelihood that the abbreviated species was formed 
as a result of microbial translation initiation at 
the Met 48 residue brought about by the similarity 
5 of base sequences 3' thereto to a Shine-Delgarno ribo- 
some binding sequence. It thus appeared that while 
about half of the transcribed mRNA's bound to ribosomes 
only at a locus prior to the initial methionine, the 
other half were bound at a locus prior to the Met codon. 

10 in order to diminish the likelihood of ribosome binding 
internally within the polypeptide coding region, sections 
33 and 34 of subunit IF-3 were reconstructed.- More 
specifically, the GAG codon employed to specify a 
glutamate residue at position 41 was replaced by the 

15 alternate, GAA, codon and the CGT codon employed to 
specify arginine at position 45 was replaced by the 
alternate, CGC, codon. These changes, effected during 
construction of the gene specifying the y-6 analog 
of IFN-y, resulted in the expresssion of a single 

20 predominant species of polypeptide of the appropriate 
leng th . 

The following examples 7 and 8 relate to 
procedures of the invention for generating a manufac- 
tured gene specifying the F subtype of human leukocyte 
25 interferon ("LeuIFN-F" or "IFN-aF" ) and polypeptide 
analogs thereof. 

EXAMPLE 7 

, 30 T he amino acid sequences for the human leuko- 

cyte interferon of the F subtype has been deduced 
by way of sequencing of cDNA clones. See, e.g., Goedell, 
et al., Nature, 200, pp. 20-26 (1981). The general 
procedures of prior Examples 1, 2 and 3 were employed 
35 in the design and assembly of a manufactured DNA sequence 
for use in microbial expression of IFN-aF in ^ coli 
by means of a pBR322-der ived expression vector. A 
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general plan for the construction of three -major" 
subunit DNA sequences (LeuIFN-F I , LeuIFN-F II and 
LeuIFN-F III) and one "minor" subunit DNA sequence 
(LeuIFN-F IV) was evolved and is shown in Table VI 
5 below. 
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As in the case of the gene manufacture 
strategy set out in Table IV, the strategy of Table VII 
involves use of bacterial preference codons wherever it 
is not inconsistent with deoxyribonucleotide segment 
5 constructions. Construction of an expression vector 
with the subunits was similar to that involved wxth 
the iFNy-specifying gene, with minor • differences in 
restriction enzymes employed. Subunit I is ligated 
into PBR322 cut with EcoRI and Sail. (Note that the 

10 subunit terminal portion includes a single stranded 
Sail "sticky end" but, upon complementation, a Sail 
recognition site is not reconstituted. A full BamHI 
recognition site remains, however, allowing for subse- 
quent excision of the subunit.) This first inter- 

15 mediate plasmid is amplified and subunit II is inserted 
into the amplified plasmid after again cutting with 
EcoRI and Sail. The second intermediate plasmid thus 
formed is amplified and subunit III is inserted into 
the amplified plasmid cut with EcoRI and Hindlll. The 

20 third intermediate plasmid thus formed is amplified. 
Subunit IV is ligated to an EcoRI and Xbal fragment 
isolated from P INT Y -TXb4 of Example 4 and this ligation 
product (having EcoRI and BstEII sticky ends) is then 
inserted into the amplified third intermediate plasmid 

25 cu t with ECORI and BstEII to yield the final expression 

vector . , 

Tne isolated product of trp promoter/operator 

controlled E.coli expression of the manufactured DNA 
sequence of Table VII as inserted into the final expres 
30 sion vector was designated IFN-aF^ 

1 EXAMPLE 8 

- 

♦ -* . * 

As discussed infra with respect to consensus 
35 leukocyte interferon, : those human leukocyte interferon 
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20 



25 



subtypes having a threonine residue at position 14 and 
a methionine residue at position 16 are reputed to 
display greater antiviral activity than those subtypes 



possessing Ala 14 and He 16 residues. An analog of 
5 human leukocyte interferon subtype F was therefore manu- 
factured by means of microbial expression of a DNA 

of Example 7 which had been altered to specify 
and methionine as residues 14 and 16, respec- 



threonine 



tively. More spec 



ifically, [Thr 14 , Met 16 ] IFN-oF 



designated lFN-oF 2 , was expressed in E.coli upon trans- 
formation with a vector of Example 7 which had been 
cut with Sail and Hindlll and' into which a modified 
subunit II (of Table VII) was inserted. The specific 
modifications of subunit II involved assembly with seg- 



15 ment 



39 altered to replace the alanine-specif ying 



codon, GCT, with a threonine-specifying ACT codon and 
replace the isoleucine-specifying codon, ATT, with an 
ATG codon. Corresponding changes in complementary 
bases were made in section 40 of subunit LeulFN-FII. 

The following Examples 9 and 10 relate to 
practice of the invention in the microbial synthesis 
consensus human leukocyte interferon polypeptides 
ich can be designated as analogs of human leukocyte 
interferon subtype F. 

EXAMPLE 9 



of 
wb 



-Consensus human leukocyte interferon- ("IFN-Con, 
"LeuIFN-Con") as employed herein shall mean a non- 
30 naturally-occurring polypeptide which predominantly 
includes those amino acid residues which are common 
to all naturally-occurring human leukocyte interferon 
subtype sequences and which includes, at one or more 
of those positions wherein there is no amino acid 
35 common to all : subtypes , an amino acid which predomi- 
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nantly occurs at that position and in no event includes 
any amino acid -residue which is not extant. in that posi- 
tion in at least one" naturally-occurring subtype. (For 
purposes of this definition, subtype A is posit ionally 
5 aligned with other subtypes and thus reveals a "missing" 
amino acid at position 44.) As so defined, a consensus 
human leukocyte interferon will ordinarily include all 
known common amino acid residues of all subtypes. It 
will be understood that the state of knowledge con- 
10 cerning naturally-occurring subtype sequences is continu- 
ously developing. New subtypes may be discovered which 
may destroy the "commonality" of a particular residue 
at a particular position. Polypeptides whose structures 
are predicted on the basis of a later -amended determina- 
15 tion of commonality at one or more positions would 

remain within the definition because they would nonethe- 
less predominantly include common amino acids and 
because those amino acids no longer held to be common 
would nonetheless quite likely represent the predomi- 
20 nant amino acid at the given positions. Failure of 

a polypeptide to include either a common or predominant 
amino acid at any given position would not remove the 
molecule from the definition so long as the residue 
• at the position occurred in at least one subtype. Poly- 
25 peptides lacking one or more internal or terminal resi- 
dues of consensus human leukocyte interferon or includ- 
ing internal or terminal residues having no counterpart 
in any subtype would be considered analogs of human 
consensus leukocyte interferon. 
30 Published predicted amino acid sequences for 

eight cDNA-derived human leukocyte interferon subtypes 
were analyzed in the context of the identities of amino 
acids within the sequence of 166 residues. See, gener- 
ally, Goedell, et al.. Nature , 290, pp. 20-26 (1981) 
35 comparing LeIFN-A through LeIFN-H and noting that only 

79 amino acids appear in identical positions in all 

OM7I 
<W W1PO 
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eight interferon forms and 99 amino acids appear in 
identical positions . if the E subtype (deduced from a . 
cDNA pseudogene) was ignored. Each of the remaining 
positions was analyzed for the relative frequency of 
5 occurrence of a given amino acid and, where a given 
amino acid appeared at the same position in at least 
five of the eight forms, it was designated as the pre- 
dominant amino acid for that position. A "consensus" 
polypeptide sequence of 166 amino acids was plotted 
10 out and compared back to the eight individual sequences, 
resulting in the determination that LeIFN-F required 
few modifications from its -naturally-occurring" form 
to comply with the consensus sequence. 

A program for construction of a manufactured 
15 IFN-Con DNA sequence was developed and is set out 

below in Table VIII. In the table, an asterisk desig- 
nates the variations in IFN-oF needed to develop ^ 
LeIFN-Con., i.e., to develop the I Arg , Ala , Asp , 
Glu 79 , Tyr 86 , Tyr 90 , Leu 96 , Thr 156 , Asn 157 , Leu 158 ] 
analog of IFN-oF. The illustrated top strand sequence 
includes, wherever possible, codons noted to the subject 
of preferential expression in E. coli. The sequence 
also includes bases providing recognition sites for Sal, 
HindHI, and BstE2 at positions intermediate the se- 
quence and for XBal and BamHI at its ends. The latter 
sites are selected for use in incorporation of the se- 
quence in a pBR322 vector, as was the case with the 
sequence developed for IFN-oF and its analogs. 
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TABLE VIII 

10 

ATG TGT GAT TTA CCT CAA ACT CAT 1U1 v~ 

20 * 

40 " 

30 . rr,-«._AsT3-Phe-Gly-Phe-Pro-Gln-Glu- 

Ser-Cys-Leu-Lys-Asp-Arg-His-Asp ^J^±J TTT CCG C AA GAA 

AGC TGC CTG AAA GAC CGT CAC GAC TTC GGC Hi 

50 

* * * 80 



Val-Leu- 
GTA CTG 

70 

Thr-I*ys- 
ACT AAA 



Lys-Phe- 
AAG TTC 



Ala-Cys« 
GCA TGC 



• Met-Asiv 
25 ATG AAC 



.Tyr-Thr-Glu-Leu- 
TAC ACT GAA CTG 

100 

-Val-Ile-Gln-Glu- 
GTA ATC CAG GAA 



-Val-Asp-Ser-Ile- 
GTC GAC TCT ATT 

130 



*90 ^, 
■Tvr-Gln-Gln-Leu-Asn-Asp-Leu-Glu- 

TAT CAG CAG CTG AAC GAC CTG GAA 

110 

-Val-Gly-Val-Glu-Glu-Thr -Pro-Leu- 
GTT GGT GTA GAA GAG ACT CCG CTG 

120 

-Leu-Ala-Val-Lys-Lys-Tyr-Phe-Gln- 

CTG GCA GTT AAA AAG TAC TTC CAG 



- „ T1 o-Thr-Leu-Tvr-Leu-Thr-Glu-Lys-Lys- 
SS'SS ACT CTG TAC CTG ACC GAA AAG AAA 

iia-Trp-Glu-Val-Val-Arg-Ala-Glu-Ile-Met 

GC? TGG GAA GTA GTT CGC GCT GAA ATT ATG 

* * * 160 
T*n-^r-Thr-Asn-Leu-Gln-Glu-Arg-Leu-Arg 

CTG TCT ACT AaS CTG CAG GAG CGT CTG CGC 



-Tyr-Ser-Pro-Cys- 
TAT TCT CCG TGC 

150 

-Arg-Ser-Phe-Ser- 
CGT TCT TTC TCT 

166 Stop 
-Arg-Lys-Glu 
CGT AAA GAA TAA 



35 



Stop 



TAG 



9> 
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Table IX below sets out the specific double 
stranded DNA sequence for preparation 4 subunit DNA 
sequences for use injaanufacture of IFN-Con^ Subunxt 
LeuIFN-Con IV is a duplicate of LeuIFN-F IV of Table 
S VIII. Segments of subunits which differ from those 
employed to construct the IFN-ctF gene are designated 
with a -prime" (e.g., 37 1 and 38 • are altered forms 
of sections 37 and 38 needed to provide arginine rather 
than glycine at position 22) . 



WO 83/04053 



PCT/US83/00605 



- 65 - 



0 



> 
C 

o 
a 



M 



CQ 

03 



3 

<D 

ON »-J 
M 

co a> 

CO 
02 

K 
U 

Eh 

C 

in i-i 

a 

O 
u 

3 

Q4 

CN CQ 
< 

CO 

a 

*— I <D 

I s 



M 
(0 



s 

o 
u a 

Eh < 



Eh < CM 

< Eh 



A < 
o o 

an 

o a 

e« < 
a a 
a a 

C3 O 
Eh <3 

a o 
< 

u o 

Eh <C 
m < Eh 

o o 

< Eh 

I 



OMPI 



WO 83/04053 PCT/US83/00605 



- 66 - 



>-h 
w 

c 
o 
a 



w 

3 



<u 
x: 

o 

u 

Of 
Wl 

d) 
ca 

ai 

rH 



H 
< 

u 

< 

o c 

CD 
(0 

< 

3 
<D 
1-3 

a 
a> 

i-3 
rH 



i-3 
cd 

rH 

M 
CO 



us 

cd a J 
cj cd 
o cd 

o o 
Lu cd 
eh < 

M 

< &h 

eh < 

CD O 

a cd 



e« < co 

CD O co 

a CD 



r- o o 

< Eh 

cd a 

o CD 

< 

a cd. 

CD CJ 
CD CJ 

eh < 

O CD 

_CDCJ 
Eh < 

a cd 
EH 

< eh 

CD O 
Eh < 

O a 

E* < 

a cd 

CD CJ 
CJ CD 

cd cj 

o\ O CD 
cn 

cd a 

CJ CD 



1-4 

o 
o 



a 

2 



CD 



hate 

< Ej 

< Eh 
CJ CD 



co 

CQ 



[J 



o 
c 

r-4 

CD 

C 
CO 

< 
>1 

l—f 
CD 

CU 

ca 

CP 
Cm 

a 

rH 

CD 

3 
rH 

CD 

o c 

^ rH 

o 

O 
M 
ft 

CD 
O* 

>i 
CD 

o 

Q* 

CU 
ca 

ca 
f-i 

03 

D> 
< 

ca 
< 

ca 

O 3 

m o 

i-3 

ca 
>i 
CJ 

u 

<D 
CQ 



a 



CD 



CJ 



cn CD 



O CD- 
CD CJ 
CD O 

a cd 

CD O 

a cd 

< Eh 
O CD 

Eh *C 
CD O 
O CD 

cj cd 
•a: eh 
a u 




cd a 

Eh < 
CJ CD 

CJ CD 

cd a 
^ < 

a cd 
a cj 
< 



M 
<D 
CQ 

O 

CU 

3 
i-3 
C 

ca 
< 

Cm 

U 

EH 

' C 
' rH 
CD 

C 
r-i 

CD 

a> 

rH 

rH> 

O 4-> 

VO CP 



3 
rH 
CD 

ca 



3 
CP 
J. 



u 
a> 
ca 

<p 

rH 
l-H 

cd 

< 

c 

rH 
CD 

cd 

rH 

< 

o ca 
»-3 
c 

rH 

CD 



O CD 
O O 
Eh < 

El 

CJ CD 
CJ CD 

%P 

CJ CD 
Eh < 

e 2 

_CJ CD 
CJ CD 

< Eh 

CD CJ 

< Eh 
CJ CD 

CJ CD 

CJ CD 
H < 

< Eh 



cn 

CM 



CD CJ o 

H < cn 

< £■* 

Si: 

cd a 

a xd 

< e^ 

CJ CD 

CD CJ_ 
^ < 

a cd 

< Eh 
_CD a 

Eh < 
CJ CD 
E* < 

CJ CD 

< Eh 

< Eh 
CJ CD 

a u 



CD CJ CN 
< 

U CD 

rH 

CO Eh < 
CJ CD 
CD CJ 



CD CJ 
< E-« 
O CD 



J-H 
M 

C 



C^ 3 

r* rH 

CD 

ca 
< 

CU 

t-l 

Eh 

cd 

rH 

<d 

rH 

o 
cn 

u 

0) 
CQ 

o, 

ca 

< 

CO 

>l 

i-3 

O S-i 
Eh 



< 
CD 
CJ 
E< 

CD CJ 

CD CJ • 
CJ CM 

a 

Eh < 



s 



Eh < 
m CJ CD 
cm CD CJ 



Eh < 
O CD 
CD CJ_ 

Eh <" 
CJ CD 
Eh < 

p—CJ CD 

cd a 

< E* 

CJ CD cb 
<I Eh CM 

CD a 



CM 



< EH 

Eh 

cj a 

< Eh 



WO 83/04053 



67 



PCT/US83/00605 



M 

c 
o 
o 

z 

fa 

M 
3 

(D 



c 
a 
c 

r-J 
CP 

O W| 

E* 

3 

a 

3- 
iH 

cp 

>t 

01 

>i 

3 
•H 
CP 

3 
3 

o u 
cq 



u a 

cp a 
*c eh 
cj a 

< ^ 
eh 

cp q 

eh < 

a CD 



< £-i <N 
CP O 



- uo 
ih a o 

cm < E* 



O CP 

< &J 
E* < 

$% 

cp o 

"cp o 

< E* 
cp u 

cp q 
e* < 
a cp 



m 

CM 



M 

o 
o 
w 



cp 

Eh 
E* 

a 

1 

< 

o 
cp 
cp 

a 

Eh 



* 



cp 
o 

eh 
o 
o 
a 
E* 



M 



C 



3 

a 

3 

CP 

iH 
(0 
> 

>i 

o 

10 
> 

3 
•H 

a 
c 

iH 
CP 



O r-4 
O (0 

^ > 

03 

a 

(0 

3 
CP 
3 

a> 

< 
C 

< 

3 

a> 



cp a 
■< e< 
cp 

51 I*' 
cp u 

Eh 

EH < 

cp a 

O O co 
iH 

SB 

cp a 

cp a 
r- <: e* 
a a 



a o 

Eh < 

< U 

Eh rf. 
CP O 

a a 
a o 
e+ «< 

.< h 
o o o 

CM 

CP U 
Eh < 

a a 



cn O CP 
i-I <: Eh 

cp a 

o CP 
< Eh 
2 Eh 

cp q 

Eh <£ 

a cp 



(0 



CO C 

rH CO 



3 

O O 
i-l u 



EH 

_ s 

CP o 

O CP 

2 E 

o a h 

<: e* 



in u U 

i— I Eh rtj 

O CP 

CP O 

O CP 

O CP 

Eh < 



WO 83/04053 



PCT/US83/00605 



- 68 - 



c 
o 
o 

z 
CX4 

M 
3 
0) 



D 
0) 
iJ 

O s-r 

f-i En 

&4 
o> 

fH 

o> 

C 

cp 

CI 

P4 

M 

6h 

03 
>i 

ca 
>» 
- J 



oi CO 
*-* > 

as 

«< 



cn 

CU 
in 



r-4 CO 

r-4 > 



o o 
64 < 
o cp_ 

a a 

Eh < 

a a 
jh < 
a a 

eh < 
o cp 

a cp 

&4 < 04 



6h < 

o cp 

CD u 



iH O CP 

a u 
64 ■< 




o e> 
cp o 

cp a 

Eh < 

o cp 

< Eh 

Eh < 
O CP 
cn Eh < 



Hi 

o 
o 
h 



op 



cp 
o 



a 



o cp 
cp a 
cp o 

< Eh 

O OJ 
Eh 
Eh 



(0 
CO 



01 

u 

a> 

CO 
o CJ» 

m u 
•—c < 

4J 

a> 
56 

a> 

w 

CP 

as 

<: 

r-f 

(0 
> 

CO 
> 



CP 

a 
64 



CO 
>i 

a 
o 

u 

P4 

a 

CO 
M 

>1 

Eh 
CO 

>* 
1-5 

ca 

3 



o cp 

Eh < 
Eh < 



64 *C 

a eg 
&• < 

Eh < 
moo 
O CP 

o o 

Eh < 

< 64 

64 < 

S3 
^ If 

cp o 

eh 
_jo o 
cp a 

a o 
cp o 
o cp 

Eh 
Eh 

CP O 

< Eh 
Eh < 



St: 

a o 

cp o 
cp a 

Eh < 
64 < 

a o 
a o 

a caj 

CP o 

&4 *c 

CP O 

o cp 

JJ CP 
"Vi < 

o a 

Eh < 

Eh < 
< Eh 

eh <: 



US 



CP 
Eh 



Eh 
Eh 
< Eh 



T3 
C 

0) 



CO 

CO 



CP u 

^ Eh 
rt! Eh 
cti CP O 



M 

e 

co 

a 

cu 

jj 

cn 

a 

cn 
vo s 

to r-C 
-I CP 

CO 

>t 

1-3 

cn 

u 

< 

CP 

w 



CD 

u 

o a 

^CP 

c 

CP 

a> 
c 

(0 

< 

u 
Eh 

u 

a> 
cn 

3 

►a 

cn 



o 



S 

CP 



< 

Eh 

a 



CP v 

< Eh 

< Eh 
E« < 

CP o 



cp a 
a cp 

o CP 
.CP u 
CP 

CP o 

a cp 
a o 

CJ CP 

cp a 

< Eh 
CP o 

CP o 

< Eh 

a cp 



- cp a 

fO Eh < 

a cp 

cj CP 

3E 

a ciL 

< 64 

a cp 
cp cj 

< Eh 

*_CP O 
Eh < 
U CP 

Eh < 
U O 
64 < 



WO 83/04053 



PCT/US83/00G05 



- 69 - 



10 



The four subunits of Table IX were sequentially 
inserted into an expression vector according to the pro- 
cedure of Example 7 to yield. a vector having the coding 
region of Table VIII under control of a trp promoter/ 
operator. The product of expression of this vector in 
E.coli was designated IFN-Cor^. l£ -will be' noted that : 
this polypeptide includes all common residues indicated 
in Goedell, et al . , supra , and, with the exception of 
Ser 80 , Glu 83 , Val 114 , and Lys 121 , included the predomi- 
nant amino acid indicated by analysis of the reference's 
summary of sequences. The four above-noted residues 
were retained from the native IFN-ctF sequence to facili- 
tate construction of subunits and assembly of subunits 
into an expression vector . (Note, e.g., serine was 
15 retained at position 80 to allow for construction, of a 

Hindlll site.) 

Since publication of the Goedell, et al. 

summary of IFN-o subtypes, a number of additional sub- 
types have been ascertained. Figure 2 sets out in 

20 tabular form the deduced sequences of the 13 presently 
known subtypes (exclusive of those revealed by five 
known cDNA pseudogenes) with designations of the same 
IFN-a subtypes from different laboratories indicated 
parenthetically (e.g., IFN-a6 and IFN-oK) . See, e.g., 

25 ' Goedell, et al., supra ; Stebbing, et al., in: Recombi- 
nant nwA Products r insulin, Tnterferons and Growth 
Hormones (A. Bollon, ed.), CRC Press (1983); and Weiss- 
man, et al., n.c.L.A. Sy ^p-Mol.Cell Biol., 25, pp. 
295-326 (1982). Positions where there is no common 

30 amino acid are shown in bold face. IFN-o subtypes are 
" roughly grouped on the basis of amino acid residues. 
In seven positions (14, 16," 71, 78, 79, 83, and 160) 
the various subtypes show" just two alternative amino 
acids, allowing classification of the subtypes into two 

35 subgroups (I and II) based on which of the seven posi- 
tions are occupied by the same amino acid residues. 
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Three IFN-a subtypes (1/ F and B) cannot be classified 
as Group I or Group II and, in terms of distinguishing 
positions, they appear to be natural hybrids of both 
group subtypes. It has been reported that IFN-a sub- 
types of the Group I type display relatively high anti- 
viral activity while those of Group II display rela- 
tively high antitumor activity. 

IFN-Con, structure is described in the final 
line of the Figure. It is noteworthy that certain resi- 

of iFN-Con. (e.g., serine at position 8) which were 

determined to be "common" on the basis of the Goedell, 
et al., sequences are now seen .to be -predominant". 
Further, certain of the IFN-Con^ residues determined^ 
to be predominant on the basis of the reference (Arg , 
15 Asp 78 , Glu 79 , and Tyr 86 } are no longer so on the basis 
of updated information, while certain heretofore non- 
predominant others (Ser 80 and Glu 83 ) now can be deter- 
mined to be predominant. 



10 dues 



20 



25 



30 



EXAMPLE 10 

A human consensus leukocyte interferon which 

from IFN-Con 1 in terms of the identity of amino 

acid residues at positions 14 and 16 was prepared by 
modification of the DNA sequence coding for IFN-Con^ 
More specifically, the expression vector for IFN-Conj^ 
was treated with BstEII and Hind III to delete subunit 
LeuIFN Con III. A modified subunit was inserted wherein 
the alanine-specifying codon, GCT, of sections 39 and 
40 was altered to a threonine-specifying codon, ACT, 
and the isoleucine codon, CTG, was changed to ATG. The 
product of expression of the modified manufactured gene, 
^Thr 14 , Met 16 , Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , 
Leu 96 , Thr 156 , Asn 157 , Leu 158 ] IFN-oF, was designated 
35 lFN-Con 2 « 
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Presently being constructed is a gene for a 
consensus human leukocyte interferon polypeptide which 
will differ from IFN-Con. in terms of the identity of 
residues at positions 114 and 121. More specifically, 
5 the Val 114 and Lys 121 residues which duplicate IFN-aF 
subtype residues but are not .predominant amino ac ^s 
will be changed ta the predominant Glu 114 and Arg 
residues , respectively. Because the codon change from 
Val 114 to Arg 114 (e.g.r GTC to GAA) will no longer allow 

10 for a Sail site at the terminal portion of subunit 
LeuIFN Con I (of Table IX) , subunits I and II will 
likely need to be constructed as a single subunit. 
Changing the AAA, lysine , codon of sections 11 and 12 
to CTG will allow for the presence of arginine at posi- 

15 tion 121. The .product of microbial expression of the 
manufactured gene r [Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 8 , 
Tyr 90 , Leu 96 , Glu 114 , Arg 121 , Thr 156 , Asn 157 , Leu 158 ] 
IFN-aF, will be designated IFN-Con 3 . 

The following example relates to procedures 
for enhancing levels of expression of exogenous genes 
in bacterial species, especially, E.coli . 

EXAMPLE 11 

In the course of development of expression 
vectors in the above examples, the trp promoter/operator 
DNA sequence was employed which included a ribosome 
binding site ("RES") sequence in a position just prior 
to the initial translation start (Met -1 , ATG) . An 
attempt was made to increase levels of expression of 
the various exogenous genes in E.coli by incorporating 
DNA sequences duplicative of portions of putative RBS 
sequences extant in genomic E.coli DNA sequences associ- 
ated with highly expressed cellular proteins. Ribosome 
binding site sequences of such protein-coding genes as 

f OMPI 
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reported in Inokuchi, et al. Nuc.Acids -Res - f 10, 
pp. 6957-6968 (1982), Gold, et al., Ann. Rev. Microbiol. , 
35, pp. 365-403 (1981) and Alton, et al., Nature , 282 , 
pp. 864-869 (1979), were reviewed and the determina- 
5 tion was made to employ sequences partially duplicative 
of those associated. wi : th the- E.coli 1 proteins OMP-F 
(outer membrane protein F),rCR0 and CAM (chloramphenicol 

transacetylase) . 

By way of example, to duplicate a portion 
10 of the OMP-F RBS sequence the following sequence is 
inserted prior to the Met"" 1 codon. 

5 1 -AACCATGAGGGTAATAAATA-3 1 
3 1 -TTGGTACTCCCATTATTTAT- 5 1 

In order to incorporate this sequence in 
a position prior to the protein coding region of, e.g., 
the manufactured gene coding for IFN-Conj^ or IFN-aF^, 
subunit IV of the expression vector was deleted (by 
cutting the vector with Xbal and BstElI) and replaced 
with a modified subunit IV involving altered sections 
41A and 42A and the replacement of sections 43 and 
44 with new segments RBI and RB2. The construction 
of the modified sequence is as set out in Table X, 
below . 

TABLE X 



15 



20 



25 



Xbal -112 

Met Cys Asp 



, -rbi— r 

30 CTAGAAA CCA TGA GGG TAA TAA ATA ATG TGT GAT 
TTT GGT ACT CCC 
J RB2 



TTT GGT ACT CCC ATT ATT TAT TAC AjCA CTA 



3 4 5 6 7 8 9 
Leu Pro Gin Thr His Ser Leu BstEII 



35 



•41A ■ 1 



TTA CCT CAA ACT CAT TCT CTT G 
AAT GGA GTT TGA GTA AGA GAA CATG 
42A 1 
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Table XI, below, illustrates the entire DNA 
sequence in the region preceding the protein coding 
region of the reconstructed gene starting with the Hpal 
site within the trp promoter/operator (compare subunit 
5 IF-4 of Table IV) ._ 

% " •"' TABLE XI 

™ t Xbal 
Hpal 

AAC TAG TAC GCA AGT TCA CGT AAA AAG GGT ATC TAG AAA .CCA 
10 TTG ATC ATG CGT TCA AGT GCA TTT TTC CCA TAG ATC TTT GGT 

_1 1 2 • 3- 4 5 6 7 
Met Cys Asp Leu Pro Gin Thr His 

TGA GGG TAA TAA ATA ATG TGT GAT TTA CCT CAA ACT CAT 
1(: ACT CCC ATT ATT TAT TAC ACA CTA AAT GGA GTT TGA GTA 

* • 

8 9 BstE II 
Ser Leu- 



20 



25 



30 



TCT CTT G 
AGA GAA CATG 

Similar procedures were followed to incorpo- 
rate sequences duplicative of RBS sequences of CRO and 
CAM genes, resulting in the following sequences immedi 
ately preceding the Met" codon. 

1 10 20 

CRO: GCATGTACTAAGGAGGTTGT 
CGTACATGATTCCTCCAACA 

1 10 20 

7 

CAM: CAGGAGCTAAGGAAGCTAAA 
GTCCTCGATTCCTTCGATTT 



It will be noted that all the RBS sequence inserts 
possess substantial homology to Shine-Delgarno 
sequences, are rich in adenine and include sequences 
35 ordinarily providing -stop" codons. 
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Levels of E.coli expression of IFN-Con 1 were 
determined using trp-controlled expression vectors 
incorporating the three RBS inserts (in addition to the 
RBS sequence extant in the complete trp promoter/oper- 
ator); Expression of the desired polypeptide using the 
OMP-P RBS duplicating sequence was at from 150-300 mg 
per liter of culture, representing from 10 to 20 percent 
of total protein. Vectors incorporating the CAM RBS 
duplicating sequence provided levels of expression which 
were about one-half that provided by the OMP-F variant. 
Vectors including the CRO RBS duplicating sequence 
yielded the desired protein at Revels of about one-tenth 
that of the OMP-P variant. 

The following example relates to antiviral 
activity screening of human leukocyte interferon and 
polypeptides provided by the preceding examples. 

EXAMPLE 12 

Table XII below provides the results of 
testing of antiviral activity in various cell lines of 
natural (buffy coat) interferon and isolated, microbially 
expressed, polypeptides designated IFN-oF^ IPN-oP 2 , 
IFN-Con., and lFN-Con 2 . Viruses used were VSV (vesicular 
stomatitis virus) and EMCV (encepbalomyocarditis virus). 
Cell lines were from various mammalian sources, including 
human (WISH, HeLa) , bovine (MDBK) , mouse (MLV-6) , and 
monkey (Vero) . Antiviral activity was determined by an 
end-point cytopathic effect assay as described in Week, 
et al., J -Gen .Virol. , 57, PP- 233-237 (1981) and. Camp- 
bell, et »i -- r^n. J. Microbiol. , 21, pp. 1247-1253 (1975). 
Data shown was normalized for antiviral activity in WISH 
cells . 



35 
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TABLE XII 



10 



20 



25 



30 



35 



Vxrus 


Cell 
liine 


Buffy 


IFN— 
1 


IFN- 

aF 0 
2 


Xr w 
Con., 


XX fl 

Con 


VSV 


WISH 3 


100 » : 


100 ; ' 


100 


100 


100 


VSV 


HeLa 


400 " 


'100' 


»ND* : 


200 * 


•100 


VSV 


MDBK 


1600 


33 


ND 

• 


200 


300 


VSV 


MLV-6 


20 


5 


ND 


3 


20 


VSV 


Vero 


10 


0.1 


ND 


10 


0 


EMCV 


WISH 


100 


100 


100 


100 


100 


EMCV 


HeLa 


100 


5 


ND 


33 


33 


EMCV 


Vero 


100 


20 


•ND* 


1000* 


10 



• m 



*ND - no data presently available. 

• • . . . . 

« ... 

15 It wili be apparent from the above examples 

that the present invention provides, for the first time, 
an entire new genus of synthesized, biologically active 
proteinaceous products which products differ from 
naturally-occurring forms in terms of the identity 
and/or location of one or more amino acids and in terms 
of one or more biological (e.g., antibody reactivity) 
and pharmacological (e.g., potency or duration of effect) 
but which substantially retain other such properties. 

Products of the present invention and/or anti- 
bodies thereto may be suitably "tagged", for example 
radiolabeled (e.g., with I 125 ) conjugated with enzymes 
or fluorescently labelled, to provide reagent materials 
useful in assays and/or diagnostic test kits, for the 
qualitative and/or quantitative determination of the 
presence of such products and/or. said antibodies in . 
fluid samples. Such anitbodies may be obtained from 
the innoculation of one or more animal species (e.g., 
mice rabbit, goat, human, etc.) or from monoclonal anti- 
body sources. Any of such reagent materials may be used 
alone or in combination with a suitable substrate, e.g., 
coated on a glass or plastic particle bead. 
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Numerous modifications and variations in the 
practice of the invention are expected to occur to those 
skilled in the art upon consideration of the foregoing 
illustrative examples. Consequently, the invention 
should be considered as limited only to the extent 
' reflected by the appended" claims . — 
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WHAT IS CLAIMED IS: 

1. A method for the manufacture of linear, 
double stranded DNA sequences of a length in excess 
of about 200 base pairs and coding for expression 
of a predetermined .'continuous, sequence of .amino acids 
with a selected host microorganism transformed by 
a selected DNA vector incuding the sequence, said 

method comprising: 

(a) preparing two or more different, subunxt, 

linear, double stranded DNA sequences of from about 

100 to about 200 base pairs in length, 

each different subunit DNA sequence comprising 
a series of nucleotide base codons coding for a differ- 
ent continuous portion of said predetermined sequence 
of amino acids to be expressed, 

one terminal region of a first of said sub- 
units comprising a portion of a base sequence which 
provides a recognition site for cleavage by a first 
restriction endonuclease , which recognition site is 
entirely present either once or not at all in said 
selected assembly vector upon insertion of the subunit 
therein, 

one terminal region of a second of said 
subunits comprising a portion of a base sequence which 
provides a recognition site for cleavage by a second 
restriction endonuclease other than said first endonuc- 
lease, which recognition site is entirely present 
once or not at all in said selected assembly vector 
upon insertion of the subunit therein, 

at least one-half of all remaining terminal 
regions of subunits comprising a portion of a recogni- 
tion site for restriction endonuclease cleavage by 
an endonuclease other than said first and second endo- 
nucleases, which recognition site is entirely present 
once and only once in said selected assembly vector 
after insertion of all subunits thereinto; and 
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(b) serially inserting each of said subunit 
DNA sequences prepared in step (a) into the selected 
assembly vector and effecting the biological amplifica- 
tion of the assembly vector subsequent to each insertion, 
5 thereby to. form a DNA vector including the desired 
DNA sequence coding for the* predetermined continuous * 
amino acid sequence and wherein the desired DNA sequence 
assembly includes at least one unique recognition 
site for restriction endonuclease cleavage at an inter- 
10 mediate position therein. 

2. A method according to claim 1 wherein 
the restriction site for endonuclease cleavage by 
the restriction endonuclease other than said first 

IS and second endonucleases is a palindromic six base 

recognition site and the desired DNA sequence assembled 
has at least one unique six base recognition site 
for restriction endonuclease cleavage at an intermediate 
position therein* 

20 

3. A method according to claim 1 wherein 
at least three different subunit DNA sequences are 
prepared in step (a) and serially inserted into said 
selected vector in step (b) and the desired DNA sequence 

25 obtained includes at least two unique restriction 

endonuclease recognition sites at intermediate positions 
therein* 

4. A method according to claim 1 wherein 
30 the DNA sequence manufactured comprises an entire 

structural gene coding for a biologically active poly- 
peptide* 

5* A method according to claim 1 wherein, 
35 in the DNA sequence manufactured , the sequence of 

nucleotide bases includes one or more codons selected , 
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10 



from among alternative codons specifying the same 
amino acid, on the basis of preferential expression 
characteristics of the codon in said selected host 
microorganism. 

6. A manufactured, linear, double stranded 
DNA sequence of a length in excess of about 200 base 
pairs and coding, for the expression of a predetermined 
continuous sequence of amino acids by a selected host 
microorganism transformed with a selected DNA vector 
including the sequence, said sequence characterized 
by having at least one unique . palindromic six base 
recognition site for restriction endonuclease cleavage 
at an intermediate position therein. 



7 A DNA sequence according to claim 6 
characterized by having two or more unique restriction 
endonuclease cleavage sites at intermediate P osi "°^ 
therein, at least one of which has a six base palxndromxc 

20 recognition site. 

8. A DNA sequence according to claim 6 
which comprises an entire structural gene coding for 
a biologically active polypeptide. 

25' 

9. A DNA sequence according to claim 8 
wherein the gene codes for a human polypeptide. 

10. A DNA sequence according to claim 8 

30 wherein the gene codes for a polypeptide which differs 
from a naturally-occurring human polypeptxde xn terms 
of the identity and/or location of one or more ammo 
acids . 

35 ii. a DNA sequence according to claim 6 

wherein the sequence of nucleotide bases includes 
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30 



at least one or more codons selected, from among alterna- 
tive codons specifying the same amino acids, on the 
basis of preferential expression characteristics of 
the codon in said selected host microorganism. 



5 

12. A polypeptide product of the" expression 
by an organism of a manufactured DNA sequence according 
to claim 6- 

10 13. " A biologically functional DNA microorgan- 

ism transformation vector including a manufactured 
DNA sequence according to claim ..6. 

14. A vector according to claim 12 which 
15 is a circular DNA plasmid. 

15. A manufactured gene capable of directing 
the synthesis in a selected host microorganism of 
human immune interferon. 

16. A manufactured gene according to claim 
15 having at least one unique six base recognition 
site for restriction endonuclease cleavage at an inter- 
mediate position therein. 

17. A manufactured gene according to claim 
15 wherein the base sequence includes one or more 
codons selected, from among alternative codons specify- 
ing the same amino acid, on the basis of preferential 
expression characteristics of the codon in a projected 
host microorganism. 

18. A manufactured gene according to claim 
17 wherein the base sequence includes one or more 

35 codons selected, from among alternative codons specify- 
ing the same amino acid, on the basis of preferential 
expression characteristics of the codon in B. coll. 

OMPT 
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19. A manufactured gene according to claim 
15 wherein the base sequence, commencing with the 
5' end of the top strand, is as follows: 

TGT-TAC-TGC-CAG-GAT-CCG— TAC-GTT-AAG-GAA— GCA-GAA— AAC-CTG- 

■ 

AAA— AAA— TAC-TTCrAAC-GCArGGC-CAC-TCC--GAC— GTA-GCT-GAT-AAC- 
GGC-ACC-CTG-TTC-CTG-GGT-ATC-CTA-AAA— AAC-TGG-AAA— GAG-GAA- 
TCC-GAC-CTG-AAG-ATC-ATG-CAG— TCT-^CAA-ATT-GTA-AGC— TTC— TAC- 



57 
TTC 

71 
AAG 



-AAA-CTG— TTC-AAG-AAC-TTC-AAA-GAC-GAT— CAA-TCC— ATC-CAG- 
— AGC-GTA-GAA-ACT-ATT-AAG-GAG-GAC-ATG-AAC-GTA— AAA-TCC- 



TTT— AAC— AGC-AAC— AAG— AAG— AAA— CGC-GAT-GAC— TTC— GAG— AAA— CTG- 
ACT— AAC-TAC-TCT-GTT-ACA-GAT-CTG-AAC-GTG-CAG-CGT— AAA-GCT- 

ATT-CAC-GAA-CTG-ATC-CAA-GTT-ATG-GCT-GAA-CTG— TCT-CCT-GCG- 

GCA-AAG— ACT-GGC-AAA-CGC-AAG-CGT-AGC— CAG— ATG— CTG— TTT-CAG- 

141 

[or CGT] -CGT-CGC-CGT-GCT-TCT-CAG . 

20. A manufactured gene according to claim 
19 wherein the base codon for amino acid 41 is GAA and 
for amino acid 46 is CGG. 

21. A biologically functional DNA microorgan 
ism transformation vector including a manufactured gene 
according to claim 15. 
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22. A vector according to claim 21 which is 
a circular DNA plasmid. 

23. A process for the production of human 

5 . immune interferon comprising: 

growing ,. sunder appropriate nutrient conditions , 
microorganisms transformed with biologically functional 
DNA including a manufactured gene according to claim 15, 
whereby said microorganisms express said gene and pro- 
10 duce authentic human immune interferon. 

24. A process according to claim 23 wherein 
the microorganisms grown are E. coli microorganisms. 

15 25. A manufactured gene capable of directing 

the synthesis in a selected host microorganism of a 
polypeptide which differs from human immune interferon 
in terms of the identity and/or location of one or more 
amino acids. 

20 

26. A manufactured gene according to claim 
25 having at least one unique six base recognition site 
for restriction endonuclease cleavage at an intermediate 
position therein. 

27. A manufactured gene according to claim 
25 wherein the base sequence includes one or more codons 
selected, from among alternative codons specifying the 
same amino acid, on the basis of preferential expression 
characteristics of the codon in a projected host micro- 
organism. 

28. A manufactured gene according to claim 
27 wherein the base sequence includes one or more codons 

35 selected, from among alternative codons specifying the 

same amino acid, on the basis of preferential expression 
characteristics of the codon in E. coli. 



25 



30 
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29. A biologically functional DNA microorgan- 
ism transformation vector including a manufactured gene 
according to claim 25. 

30/ A vector according to claim 29 which is 
a circular DNA plasmid. . ............ . ~ 

- - * 

31. A process for the production' of a poly- 
peptide analog of human immune interferon <»*ri-ing. 

growing, under appropriate nutrient conditions, 
mi croorganisms transformed with biologically J-c^nal 
DNA including a manufactured gene according to claim 25 
whereby said microorganisms express said gene and produce 
authentic human immune interferon. 

32 A process according to claim 31 wherein 
the microorganisms grown are E. coli microorganisms. 

33. A polypeptide product of the process of 

20 claim 23. 

34. A polypeptide product of the process of 

claim 31. 

35. A polypeptide product according to claim 
34 selected from the group consisting oft 

[Lys 81 ]lFN Y ; 

[Ser 1 , Ser 3 , Lys 81 ]IFNr; 
[Lys 1 , Lys 2 , Gin 3 , Lys 

3 . _81 



l^ 1 , Lys 2 , Gin 3 , Lys 81 HFNY; 

tdes-Cys", des-Tyr 2 , des-Cys 3 , Lys ] iFNy; 



[Phe 39 , Lys 81 ]IFN Y ; 

[Thr 48 , Lys 81 ]lFN Y ; 
[Lys 81 , Cys 95 ]lFN Y ; 
[Cys 95 iiFNY; 
35 [Cys 95 , Pro 104 HFN Y ; 

[Gln 28 ]IFNYJ and 

[Glu5lIFNlf - ftJlEf^ 
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36. A manufactured gene capable of directing 
the synthesis in a selected host microorganism of con- 
sensus human leukocyte interferon. 

37. Consensus human leukocyte interferon. 

38. A consensus human leukocyte interferon 
according to claim 37 selected from the group consisting. 



[Arg 22 ", Ala 76 , Asp™, Glu" 
Leu 96 , Thr 156 , Asn 157 , Leu 158 ]IFN-aF; 

fThr 14 . Met 16 , Arg 22 , Ala 76 



, Tyr 86 , Tyr 90 



[Thr 14 , Met x , Arg' 



Asp 78 , Glu 



79 



Tyr 86 , Tyr 90 



, Leu'°, Thr 156 , Asn 157 , Leu 158 ] IFN-oF; and 



.96 



[Arg 



22, Ala 76 , Asp 78 



, Glu 79 , Tyr 86 , Tyr 90 , 



Leu 158 ] IFN-aF. 



I^*-*-*. ^ i » — 

Leu 96 , Glu 114 , Arg 121 , Thr 156 , Asn 157 , 

39 . A manufactured gene capable of directing 
the synthesis in a selected host microorganism of human 
leukocyte interferon subtype F. 

40. A manufactured gene according to claim 
39 wherein the base sequence of the top strand is as 

follows : _ ^„ 

5 • -TGT-GAT-TTA-CCT-CAA— ACT-CAT-TCT-CTT-GGT-AAC— CGT-CGC- 

GCT-CTG— ATT-CTG-CTG-GCA-CAG-ATG-GGT-CGT-ATT-TTC-CCG-TTT- 
AGC-TGC-CTG-AAA-GAC-CGT-CAC-GAC— TTC-GGC-TTT-CCG— CAA-GAA- 
GAG-TTC-GAT-GGC-AAC-CAA-TTC-CAG-AAA-GCT-CAG-GCA-ATC-TCT- 
GTA-CTG-CAC-GAA— ATG-ATC— CAA-CAG-ACC-TTC— AAC-CTG-TTT-TCC- 
ACT-AAA-GAC-AGC—TCT— GCT— ACC-TGG-GAA-CAA— AGC-TTG-CTG-GAG- 
AAG-TTC-TCC-ACT-GAA-CTG-AAC-CAG-CAG-CTG-AAC-GAC-ATG-GAA- 
GCA-TGC-GTA-ATC-CAG-GAA-GTT-GGT-GTA-GAA-GAG-ACT-CCG-CTG- 
ATG-AAC— GTC— GAC— TCT— ATT— CTG-GCA-GTT— AAA— AAG— TAC— TTC-CAG— 
CGT-ATC— ACT— CTG— TAC— CTG— ACC-GAA-AAG— AAA— TAT— TCT— CCG— TGC— 
GCT-TGG-GAA-GTA-GTT-CGC-GCT-GAA-ATT-ATG-CGT-TCT-TTC-TCT- 
CTG-AGC-AAA— ATC-TTC-CAG-GAG-CGT-CTG-CGC-CGT-AAA-GAA-3 
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41. A process for production of human leuko- 
cyte interferon subtype F comprising: 

growing, under appropriate nutrient conditxons, 
microorganisms transformed with biologically functional 
DNA including- a manufactured, gene according to, claim 39, 
wher'eby said microorganisms express said gene and produce 
human leukocyte interferon subtype P. 



42. The polypeptide product of the process 
10 of claim 41. 

43. A manufactured gene capable of directing 
the synthesis in a selected host microorganism of a poly- 
peptide which differs from human leukocyte interferon . 

15 subtype P in terms of the identity and/or location of 
one or more amino acids. 

44. A process for production of a polypeptide 
which differs from human leukocyte interferon subtype 

20 p in terms of the identity and/or location of one or 

more amino acids comprising: 

growing, under appropriate nutrient conditions, 

microorganisms transformed with biologically functional 

DNA including a manufactured gene according to claim 43, 
25 whereby said microorganisms express said gene and produce 

said polypeptide. 

45. A polypeptide product of the process of 

claim 44. 

30 • - 

46. A polypeptide product according to claim 

45 which is [Thr 14 , Met l6 ]IFN-oF. 

47. A reagent material comprising a radio- 
35 labelled manufactured DNA sequence according to claim 
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48. A reagent materi 

125 

wherein said radiolabel is I 



al according to claim 47 
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49. A reagent material comprising a tagged 
antibody to a polypeptide according to claim 12, coated 
on the surface of ' a plastic bead. 

50. A microbially synthesized, biologically 
active proteinaceous product which differs from a 
naturally-occurred form thereof in terms of the identity 
and/or location of one or more amino acids and in terms 
of one or more biological and .pharmacological propeties 
but which substantially retains other such properties 

of the naturally-occurring form. 

- 

51. a polypeptide product according to claim 
50 selected from the group consisting of: 

[Lys 81 ]IFNY; 
. [Ser 1 , Ser 3 , Lys 81 ] Ki- 
ttys 1 , Lys 2 , Gin 3 , Lys 81 ]IFN Y ; 



des-Cys 3 , Lys 81 ] iFNy; 



[des-Cys 1 , des-Tyr 2 
[Phe 39 , Lys 81 ]IFN Y ? 
[Thr 48 , Lys 81 ]lFN Y ; 

[Lys 81 , Cys 95 ]IFNy; 

[Cys 95 ]IFNY; 

[Cys 95 , Pro 104 ]IFN Y ; 

[Gln 28 ]IFNY; 
[G1u 5 ]IPNy; and 
[Thr 14 , Met 16 ] IFN-aF . 



52. A process for enhancing the expression 
of an exogenous, vector-borne, gene in B.coli , said 

process compr is i ng : 

inserting in said vector, in a position up- 
stream of the exogenous gene protein coding sequence, 
a DNA sequence comprising base pairs duplicative of 



WO 03/04053 



PCT/US83/00605 



_ 87 _ 

ribosome binding site base pairs associated with E.coli 
synthesis of OMP-F, CRO and CAM gene products. 

53. The process of claim 52 wherein the 

5 inserted DNA sequence is a sequence associated with ■ - 
E.coli synthesis o£ OMP-F gene products- - ^ 

54. The process"' of claim 53 wherein * the 
inserted DNA sequence is: 

10 1 10 20 

. • • 

5 1 -AACCATGAGGGTAATAAATA- 3 1 
3 * — TTGGTACTCCCATTATTTAT— 5 1 • 

55. The" process* of claim 53 wherein the 

15 inserted DNA sequence follows a trp promoter/operator 
sequence * 
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